APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group.

APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group

Part I: Presentations  EXPANDER  AMADEUS  SPIKE  MATISSE

Part II: Hands-on Session  EXPANDER  MATISSE  SPIKE

EXPression ANalyzer and DisplayER Adi Maron-Katz Chaim Linhart Amos Tanay Rani Elkon Israel Steinfeld Seagull Shavit Igor Ulitsky Roded Sharan Yossi Shiloh Ron Shamir http://acgt.cs.tau.ac.il/expander

EXPANDER –Low level analysis: Missing data estimation (KNN or manual) Normalization: quantile, loess Filtering: fold change, variation, t-test Standardization: mean 0 std 1, take log, fixed norm –High level gene partition analysis: Clustering Biclustering –Ascribing biological meaning to patterns: Enriched functional categories (Gene Ontology) Identify transcriptional regulators – promoter analysis Built-in support for 9 organisms: –human, mouse, rat, chicken, zebrafish, fly, worm, arabidopsis, yeast

Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

EXPANDER - Preprocessing Input data: Expression matrix (probe-row; condition-column) Expression matrix One-channel data (e.g., Affymetrix) Dual-channel data (cDNA microarrays, data are (log) ratios between the Red and Green channels) ‘.cel’ files ID conversion file: map probes to genes ID conversion file Gene sets data Data definitions: Defining condition subsets Data type & scale (log)

EXPANDER – Preprocessing (II)  Data Adjustments: Missing value estimation (KNN or arbitrary) Merging conditions Normalization: removal of systematic biases from the analyzed chips  Implemented methods: quantile, lowess  Visualization: box plots, scatter plots (simple, M vs. A)box plots

EXPANDER – Preprocessing (III)  Filtering: Focus downstream analysis on the set of “responding genes”  Fold-Change  Variation  Statistical tests (T-test)  Standardization : Create a common scale Standardization  For each probe Mean=0, STD=1  Log data (base 2)  Fixed Norm (divide by norm of probe vector)

Cluster Analysis Partition the responding genes into distinct sets, each with a particular expression pattern  Identify major patterns in the data: reduce the dimensionality of the problem  co-expression → co-function  co-expression → co-regulation Partition the genes to achieve:  Homogeneity: genes inside a cluster show highly similar expression pattern.  Separation: genes from different clusters have different expression patterns.

Cluster Analysis (II) Implemented algorithms: – CLICK, K-means, SOM, Hierarchical Visualization: – Mean expression patternsMean expression patterns – Heat-mapsHeat-maps

Ionizing Radiation Effectors (p53, BRCA1, CHK2) DNA repair Cell cycle arrest Stress responses Survival pathways Apoptosis Cell death pathways Sensors ATM Double Strand Breaks Example study: responses to ionizing radiation

Example study: experimental design Genotypes: Atm-/- and control w.t. mice Tissue: Lymph node Treatment: Ionizing radiation Time points: 0, 30 min, 120 min Microarrays: Affymetrix U74Av2 (12k probesets)

Test case - Data Analysis Dataset: six conditions (2 genotypes, 3 time points) Normalization Filtering step – define the ‘responding genes’ set genes whose expression level is changed by at least 1.75 fold Over 700 genes met this criterion The set contains genes with various response patterns – we applied CLICK to this set of genes

Major Gene Clusters – Irradiated Lymph node Atm-dependent early responding genes

Major Gene Clusters – Irradiated Lymph node Atm-dependent 2 nd wave of responding genes

Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment TANGO (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

Ascribe Functional Meaning to the Clusters Gene Ontology (GO) annotations for human, mouse, rat, chicken, fly, worm, Arabidopsis, Zebrafish and yeast. TANGO: Apply statistical tests that seek over-represented GO functional categories in the clusters.TANGO

Enriched GO Functional Categories Hierarchical structure → highly dependent categories. Problems: –High redundancy –Multiple testing corrections assume independent tests TANGO

Functional Enrichment - Visualization

Functional Categories cell cycle control (p<1x10 -6 )

Cell cycle control (p<5x10 -6 ) Apoptosis (p=0.001) Functional Categories

?????p53TF-CTF-B TF-A NEW ATM g3g13g12g10g9g1g8g7g6g5g4g11g2 Hidden layer Observed layer Clues are in the promoters Identify Transcriptional Regulators

‘Reverse engineering’ of transcriptional networks Infers regulatory mechanisms from gene expression data –Assumption: co-expression → transcriptional co-regulation → common cis-regulatory promoter elements Step 1: Identification of co-expressed genes using microarray technology (clustering algs) Step 2: Computational identification of cis- regulatory elements that are over-represented in promoters of the co-expressed gene

PRIMA – general description Input: –Target set (e.g., co-expressed genes) –Background set (e.g., all genes on the chip) Analysis: –Identify transcription factors whose binding site signatures are enriched in the ‘Target set’ with respect to the ‘Background set’. TF binding site models – TRANSFAC DB Default: From -1000 bp to 200 bp relative the TSS

Promoter Analysis - Visualization

PRIMA - Results

P-valueEnrichment factor Transcription factor P-valueEnrichment factor Transcription factor 6.0x10 -5 2.6CREB PRIMA – Results NF-  B 5.1 3.8x10 -8 p534.29.6x10 -7 STAT-13.25.4x10 -6 Sp-1 1.7 6.5x10 -4

Biclustering  Clustering becomes too restrictive on large datasets: Seeks global partition of genes according to similarity in their expression across ALL conditions  Relevant knowledge can be revealed by identifying genes with common pattern across a subset of the conditions Biclustering algorithmic approach

* Bicluster (=module) : subset of genes with similar behavior in a subset of conditions * Computationally challenging: has to consider many combinations of sub-conditions Biclustering: SAMBA Statistical Algorithmic Method for Bicluster Analysis A. Tanay, R. Sharan, R. Shamir RECOMB 02

Biclustering Visualization

Expression Data – Input File probes conditions

ID Conversion File

Normalization: Box plots Log (Intensity) Median intensity Upper quartile Lower quartile

Standardization of Expression Levels After standardization Before standardization

Cluster Analysis: Visualization (I)

BeforeAfter Cluster I Cluster II Cluster III Cluster Analysis - Visualization (II)

APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group.

Similar presentations

Presentation on theme: "APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group.

Similar presentations

Presentation on theme: "APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group."— Presentation transcript:

Similar presentations

About project

Feedback