Download presentation
Presentation is loading. Please wait.
1
APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group
2
Part I: Presentations EXPANDER AMADEUS SPIKE MATISSE
3
Part II: Hands-on Session EXPANDER MATISSE SPIKE
4
EXPression ANalyzer and DisplayER Adi Maron-Katz Chaim Linhart Amos Tanay Rani Elkon Israel Steinfeld Seagull Shavit Igor Ulitsky Roded Sharan Yossi Shiloh Ron Shamir http://acgt.cs.tau.ac.il/expander
5
EXPANDER –Low level analysis: Missing data estimation (KNN or manual) Normalization: quantile, loess Filtering: fold change, variation, t-test Standardization: mean 0 std 1, take log, fixed norm –High level gene partition analysis: Clustering Biclustering –Ascribing biological meaning to patterns: Enriched functional categories (Gene Ontology) Identify transcriptional regulators – promoter analysis Built-in support for 9 organisms: –human, mouse, rat, chicken, zebrafish, fly, worm, arabidopsis, yeast
6
Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities
7
EXPANDER - Preprocessing Input data: Expression matrix (probe-row; condition-column) Expression matrix One-channel data (e.g., Affymetrix) Dual-channel data (cDNA microarrays, data are (log) ratios between the Red and Green channels) ‘.cel’ files ID conversion file: map probes to genes ID conversion file Gene sets data Data definitions: Defining condition subsets Data type & scale (log)
8
EXPANDER – Preprocessing (II) Data Adjustments: Missing value estimation (KNN or arbitrary) Merging conditions Normalization: removal of systematic biases from the analyzed chips Implemented methods: quantile, lowess Visualization: box plots, scatter plots (simple, M vs. A)box plots
9
EXPANDER – Preprocessing (III) Filtering: Focus downstream analysis on the set of “responding genes” Fold-Change Variation Statistical tests (T-test) Standardization : Create a common scale Standardization For each probe Mean=0, STD=1 Log data (base 2) Fixed Norm (divide by norm of probe vector)
10
Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities
11
Cluster Analysis Partition the responding genes into distinct sets, each with a particular expression pattern Identify major patterns in the data: reduce the dimensionality of the problem co-expression → co-function co-expression → co-regulation Partition the genes to achieve: Homogeneity: genes inside a cluster show highly similar expression pattern. Separation: genes from different clusters have different expression patterns.
12
Cluster Analysis (II) Implemented algorithms: – CLICK, K-means, SOM, Hierarchical Visualization: – Mean expression patternsMean expression patterns – Heat-mapsHeat-maps
13
Ionizing Radiation Effectors (p53, BRCA1, CHK2) DNA repair Cell cycle arrest Stress responses Survival pathways Apoptosis Cell death pathways Sensors ATM Double Strand Breaks Example study: responses to ionizing radiation
14
Example study: experimental design Genotypes: Atm-/- and control w.t. mice Tissue: Lymph node Treatment: Ionizing radiation Time points: 0, 30 min, 120 min Microarrays: Affymetrix U74Av2 (12k probesets)
15
Test case - Data Analysis Dataset: six conditions (2 genotypes, 3 time points) Normalization Filtering step – define the ‘responding genes’ set genes whose expression level is changed by at least 1.75 fold Over 700 genes met this criterion The set contains genes with various response patterns – we applied CLICK to this set of genes
16
Major Gene Clusters – Irradiated Lymph node Atm-dependent early responding genes
17
Major Gene Clusters – Irradiated Lymph node Atm-dependent 2 nd wave of responding genes
18
Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment TANGO (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities
19
Ascribe Functional Meaning to the Clusters Gene Ontology (GO) annotations for human, mouse, rat, chicken, fly, worm, Arabidopsis, Zebrafish and yeast. TANGO: Apply statistical tests that seek over-represented GO functional categories in the clusters.TANGO
20
Enriched GO Functional Categories Hierarchical structure → highly dependent categories. Problems: –High redundancy –Multiple testing corrections assume independent tests TANGO
21
Functional Enrichment - Visualization
22
Functional Categories cell cycle control (p<1x10 -6 )
23
Cell cycle control (p<5x10 -6 ) Apoptosis (p=0.001) Functional Categories
24
Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities
25
?????p53TF-CTF-B TF-A NEW ATM g3g13g12g10g9g1g8g7g6g5g4g11g2 Hidden layer Observed layer Clues are in the promoters Identify Transcriptional Regulators
26
‘Reverse engineering’ of transcriptional networks Infers regulatory mechanisms from gene expression data –Assumption: co-expression → transcriptional co-regulation → common cis-regulatory promoter elements Step 1: Identification of co-expressed genes using microarray technology (clustering algs) Step 2: Computational identification of cis- regulatory elements that are over-represented in promoters of the co-expressed gene
27
PRIMA – general description Input: –Target set (e.g., co-expressed genes) –Background set (e.g., all genes on the chip) Analysis: –Identify transcription factors whose binding site signatures are enriched in the ‘Target set’ with respect to the ‘Background set’. TF binding site models – TRANSFAC DB Default: From -1000 bp to 200 bp relative the TSS
28
Promoter Analysis - Visualization
29
PRIMA - Results
30
P-valueEnrichment factor Transcription factor P-valueEnrichment factor Transcription factor 6.0x10 -5 2.6CREB PRIMA – Results NF- B 5.1 3.8x10 -8 p534.29.6x10 -7 STAT-13.25.4x10 -6 Sp-1 1.7 6.5x10 -4
31
Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities
32
Biclustering Clustering becomes too restrictive on large datasets: Seeks global partition of genes according to similarity in their expression across ALL conditions Relevant knowledge can be revealed by identifying genes with common pattern across a subset of the conditions Biclustering algorithmic approach
33
* Bicluster (=module) : subset of genes with similar behavior in a subset of conditions * Computationally challenging: has to consider many combinations of sub-conditions Biclustering: SAMBA Statistical Algorithmic Method for Bicluster Analysis A. Tanay, R. Sharan, R. Shamir RECOMB 02
34
Biclustering Visualization
35
Expression Data – Input File probes conditions
36
ID Conversion File
37
Normalization: Box plots Log (Intensity) Median intensity Upper quartile Lower quartile
38
Standardization of Expression Levels After standardization Before standardization
39
Cluster Analysis: Visualization (I)
40
BeforeAfter Cluster I Cluster II Cluster III Cluster Analysis - Visualization (II)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.