Presentation is loading. Please wait.

Presentation is loading. Please wait.

APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group.

Similar presentations


Presentation on theme: "APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group."— Presentation transcript:

1 APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group

2 Part I: Presentations  EXPANDER  AMADEUS  SPIKE  MATISSE

3 Part II: Hands-on Session  EXPANDER  MATISSE  SPIKE

4 EXPression ANalyzer and DisplayER Adi Maron-Katz Chaim Linhart Amos Tanay Rani Elkon Israel Steinfeld Seagull Shavit Igor Ulitsky Roded Sharan Yossi Shiloh Ron Shamir http://acgt.cs.tau.ac.il/expander

5 EXPANDER –Low level analysis: Missing data estimation (KNN or manual) Normalization: quantile, loess Filtering: fold change, variation, t-test Standardization: mean 0 std 1, take log, fixed norm –High level gene partition analysis: Clustering Biclustering –Ascribing biological meaning to patterns: Enriched functional categories (Gene Ontology) Identify transcriptional regulators – promoter analysis Built-in support for 9 organisms: –human, mouse, rat, chicken, zebrafish, fly, worm, arabidopsis, yeast

6 Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

7 EXPANDER - Preprocessing Input data: ­ Expression matrix (probe-row; condition-column) Expression matrix One-channel data (e.g., Affymetrix) Dual-channel data (cDNA microarrays, data are (log) ratios between the Red and Green channels) ‘.cel’ files ­ ID conversion file: map probes to genes ID conversion file ­ Gene sets data Data definitions: ­ Defining condition subsets ­ Data type & scale (log)

8 EXPANDER – Preprocessing (II)  Data Adjustments: ­ Missing value estimation (KNN or arbitrary) ­ Merging conditions Normalization: removal of systematic biases from the analyzed chips  Implemented methods: quantile, lowess  Visualization: box plots, scatter plots (simple, M vs. A)box plots

9 EXPANDER – Preprocessing (III)  Filtering: Focus downstream analysis on the set of “responding genes”  Fold-Change  Variation  Statistical tests (T-test)  Standardization : Create a common scale Standardization  For each probe Mean=0, STD=1  Log data (base 2)  Fixed Norm (divide by norm of probe vector)

10 Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

11 Cluster Analysis Partition the responding genes into distinct sets, each with a particular expression pattern  Identify major patterns in the data: reduce the dimensionality of the problem  co-expression → co-function  co-expression → co-regulation Partition the genes to achieve:  Homogeneity: genes inside a cluster show highly similar expression pattern.  Separation: genes from different clusters have different expression patterns.

12 Cluster Analysis (II) Implemented algorithms: – CLICK, K-means, SOM, Hierarchical Visualization: – Mean expression patternsMean expression patterns – Heat-mapsHeat-maps

13 Ionizing Radiation Effectors (p53, BRCA1, CHK2) DNA repair Cell cycle arrest Stress responses Survival pathways Apoptosis Cell death pathways Sensors ATM Double Strand Breaks Example study: responses to ionizing radiation

14 Example study: experimental design Genotypes: Atm-/- and control w.t. mice Tissue: Lymph node Treatment: Ionizing radiation Time points: 0, 30 min, 120 min Microarrays: Affymetrix U74Av2 (12k probesets)

15 Test case - Data Analysis Dataset: six conditions (2 genotypes, 3 time points) Normalization Filtering step – define the ‘responding genes’ set genes whose expression level is changed by at least 1.75 fold Over 700 genes met this criterion The set contains genes with various response patterns – we applied CLICK to this set of genes

16 Major Gene Clusters – Irradiated Lymph node Atm-dependent early responding genes

17 Major Gene Clusters – Irradiated Lymph node Atm-dependent 2 nd wave of responding genes

18 Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment TANGO (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

19 Ascribe Functional Meaning to the Clusters Gene Ontology (GO) annotations for human, mouse, rat, chicken, fly, worm, Arabidopsis, Zebrafish and yeast. TANGO: Apply statistical tests that seek over-represented GO functional categories in the clusters.TANGO

20 Enriched GO Functional Categories Hierarchical structure → highly dependent categories. Problems: –High redundancy –Multiple testing corrections assume independent tests TANGO

21 Functional Enrichment - Visualization

22 Functional Categories cell cycle control (p<1x10 -6 )

23 Cell cycle control (p<5x10 -6 ) Apoptosis (p=0.001) Functional Categories

24 Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

25 ?????p53TF-CTF-B TF-A NEW ATM g3g13g12g10g9g1g8g7g6g5g4g11g2 Hidden layer Observed layer Clues are in the promoters Identify Transcriptional Regulators

26 ‘Reverse engineering’ of transcriptional networks Infers regulatory mechanisms from gene expression data –Assumption: co-expression → transcriptional co-regulation → common cis-regulatory promoter elements Step 1: Identification of co-expressed genes using microarray technology (clustering algs) Step 2: Computational identification of cis- regulatory elements that are over-represented in promoters of the co-expressed gene

27 PRIMA – general description Input: –Target set (e.g., co-expressed genes) –Background set (e.g., all genes on the chip) Analysis: –Identify transcription factors whose binding site signatures are enriched in the ‘Target set’ with respect to the ‘Background set’. TF binding site models – TRANSFAC DB Default: From -1000 bp to 200 bp relative the TSS

28 Promoter Analysis - Visualization

29 PRIMA - Results

30 P-valueEnrichment factor Transcription factor P-valueEnrichment factor Transcription factor 6.0x10 -5 2.6CREB PRIMA – Results NF-  B 5.1 3.8x10 -8 p534.29.6x10 -7 STAT-13.25.4x10 -6 Sp-1 1.7 6.5x10 -4

31 Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

32 Biclustering  Clustering becomes too restrictive on large datasets: Seeks global partition of genes according to similarity in their expression across ALL conditions  Relevant knowledge can be revealed by identifying genes with common pattern across a subset of the conditions Biclustering algorithmic approach

33 * Bicluster (=module) : subset of genes with similar behavior in a subset of conditions * Computationally challenging: has to consider many combinations of sub-conditions Biclustering: SAMBA Statistical Algorithmic Method for Bicluster Analysis A. Tanay, R. Sharan, R. Shamir RECOMB 02

34 Biclustering Visualization

35 Expression Data – Input File probes conditions

36 ID Conversion File

37 Normalization: Box plots Log (Intensity) Median intensity Upper quartile Lower quartile

38 Standardization of Expression Levels After standardization Before standardization

39 Cluster Analysis: Visualization (I)

40 BeforeAfter Cluster I Cluster II Cluster III Cluster Analysis - Visualization (II)


Download ppt "APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group."

Similar presentations


Ads by Google