APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group.

Slides:

Advertisements

Similar presentations

Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.

Advertisements

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Microarray technology and analysis of gene expression data Hillevi Lindroos.

FISHING FOR NOVEL CAROTENOID BIOSYNTHESIS RELATED GENES OrenTzfadia>_.

GE 07 © Ron Shamir 1 DNA Chips Base on slides by Ron Shamir.

Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.

Getting the numbers comparable

SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.

Microarray GEO – Microarray sets database

DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.

1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.

Microarray Data Preprocessing and Clustering Analysis

‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.

Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.

ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics

GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.

Cluster analysis  Function  Places genes with similar expression patterns in groups.  Sometimes genes of unknown function will be grouped with genes.

Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.

ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.

Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.

Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.

Analysis of High-throughput Gene Expression Profiling

Analysis of microarray data

Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.

Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.

Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,

Gene expression profiling identifies molecular subtypes of gliomas

MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.

Analysis and Management of Microarray Data Dr G. P. S. Raghava.

A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.

DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.

Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.

CDNA Microarrays MB206.

ArrayCluster: an analytic tool for clustering, data visualization and module ﬁnder on gene expression proﬁles 組員：李祥豪謝紹陽江建霖.

Networks and Interactions Boo Virk v1.0.

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Gene expression analysis

CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.

Functional Genomics - clustering - classification - promoter analysis - expander tool - example - biclustering.

Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.

1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:

Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.

Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.

Gene Regulatory Networks and Neurodegenerative Diseases Anne Chiaramello, Ph.D Associate Professor George Washington University Medical Center Department.

1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR

Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.

Comp. Genomics Recitation 10 4/7/09 Differential expression detection.

Analyzing Expression Data: Clustering and Stats Chapter 16.

GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.

The Broad Institute of MIT and Harvard Differential Analysis.

Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School

Gene Expression Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.

Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.

Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.

The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Accessing and visualizing genomics data

Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.

Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Tom Hait Dorit Sagir Eyal David Roded.

EXPression ANalyzer and DisplayER

Two études on modularity

Analysis of GO annotation at cluster level by Agnieszka S. Juncker

Getting the numbers comparable

EXPression ANalyzer and DisplayER

Predicting Gene Expression from Sequence

Volume 122, Issue 6, Pages (September 2005)

Presentation transcript:

APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group

Part I: Presentations  EXPANDER  AMADEUS  SPIKE  MATISSE

Part II: Hands-on Session  EXPANDER  MATISSE  SPIKE

EXPression ANalyzer and DisplayER Adi Maron-Katz Chaim Linhart Amos Tanay Rani Elkon Israel Steinfeld Seagull Shavit Igor Ulitsky Roded Sharan Yossi Shiloh Ron Shamir

EXPANDER –Low level analysis: Missing data estimation (KNN or manual) Normalization: quantile, loess Filtering: fold change, variation, t-test Standardization: mean 0 std 1, take log, fixed norm –High level gene partition analysis: Clustering Biclustering –Ascribing biological meaning to patterns: Enriched functional categories (Gene Ontology) Identify transcriptional regulators – promoter analysis Built-in support for 9 organisms: –human, mouse, rat, chicken, zebrafish, fly, worm, arabidopsis, yeast

Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

EXPANDER - Preprocessing Input data: Expression matrix (probe-row; condition-column) Expression matrix One-channel data (e.g., Affymetrix) Dual-channel data (cDNA microarrays, data are (log) ratios between the Red and Green channels) ‘.cel’ files ID conversion file: map probes to genes ID conversion file Gene sets data Data definitions: Defining condition subsets Data type & scale (log)

EXPANDER – Preprocessing (II)  Data Adjustments: Missing value estimation (KNN or arbitrary) Merging conditions Normalization: removal of systematic biases from the analyzed chips  Implemented methods: quantile, lowess  Visualization: box plots, scatter plots (simple, M vs. A)box plots

EXPANDER – Preprocessing (III)  Filtering: Focus downstream analysis on the set of “responding genes”  Fold-Change  Variation  Statistical tests (T-test)  Standardization : Create a common scale Standardization  For each probe Mean=0, STD=1  Log data (base 2)  Fixed Norm (divide by norm of probe vector)

Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

Cluster Analysis Partition the responding genes into distinct sets, each with a particular expression pattern  Identify major patterns in the data: reduce the dimensionality of the problem  co-expression → co-function  co-expression → co-regulation Partition the genes to achieve:  Homogeneity: genes inside a cluster show highly similar expression pattern.  Separation: genes from different clusters have different expression patterns.

Cluster Analysis (II) Implemented algorithms: – CLICK, K-means, SOM, Hierarchical Visualization: – Mean expression patternsMean expression patterns – Heat-mapsHeat-maps

Ionizing Radiation Effectors (p53, BRCA1, CHK2) DNA repair Cell cycle arrest Stress responses Survival pathways Apoptosis Cell death pathways Sensors ATM Double Strand Breaks Example study: responses to ionizing radiation

Example study: experimental design Genotypes: Atm-/- and control w.t. mice Tissue: Lymph node Treatment: Ionizing radiation Time points: 0, 30 min, 120 min Microarrays: Affymetrix U74Av2 (12k probesets)

Test case - Data Analysis Dataset: six conditions (2 genotypes, 3 time points) Normalization Filtering step – define the ‘responding genes’ set genes whose expression level is changed by at least 1.75 fold Over 700 genes met this criterion The set contains genes with various response patterns – we applied CLICK to this set of genes

Major Gene Clusters – Irradiated Lymph node Atm-dependent early responding genes

Major Gene Clusters – Irradiated Lymph node Atm-dependent 2 nd wave of responding genes

Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment TANGO (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

Ascribe Functional Meaning to the Clusters Gene Ontology (GO) annotations for human, mouse, rat, chicken, fly, worm, Arabidopsis, Zebrafish and yeast. TANGO: Apply statistical tests that seek over-represented GO functional categories in the clusters.TANGO

Enriched GO Functional Categories Hierarchical structure → highly dependent categories. Problems: –High redundancy –Multiple testing corrections assume independent tests TANGO

Functional Enrichment - Visualization

Functional Categories cell cycle control (p<1x10 -6 )

Cell cycle control (p<5x10 -6 ) Apoptosis (p=0.001) Functional Categories

Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

?????p53TF-CTF-B TF-A NEW ATM g3g13g12g10g9g1g8g7g6g5g4g11g2 Hidden layer Observed layer Clues are in the promoters Identify Transcriptional Regulators

‘Reverse engineering’ of transcriptional networks Infers regulatory mechanisms from gene expression data –Assumption: co-expression → transcriptional co-regulation → common cis-regulatory promoter elements Step 1: Identification of co-expressed genes using microarray technology (clustering algs) Step 2: Computational identification of cis- regulatory elements that are over-represented in promoters of the co-expressed gene

PRIMA – general description Input: –Target set (e.g., co-expressed genes) –Background set (e.g., all genes on the chip) Analysis: –Identify transcription factors whose binding site signatures are enriched in the ‘Target set’ with respect to the ‘Background set’. TF binding site models – TRANSFAC DB Default: From bp to 200 bp relative the TSS

Promoter Analysis - Visualization

PRIMA - Results

P-valueEnrichment factor Transcription factor P-valueEnrichment factor Transcription factor 6.0x CREB PRIMA – Results NF-  B x10 -8 p x10 -7 STAT x10 -6 Sp x10 -4

Clustering (CLICK, SOM, K-means, Hierarchical) Input data Biclustering (SAMBA) Functional enrichment (TANGO) Normalization/ Filtering Promoter signals (PRIMA) Links to public annotation databases Visualization utilities

Biclustering  Clustering becomes too restrictive on large datasets: Seeks global partition of genes according to similarity in their expression across ALL conditions  Relevant knowledge can be revealed by identifying genes with common pattern across a subset of the conditions Biclustering algorithmic approach

* Bicluster (=module) : subset of genes with similar behavior in a subset of conditions * Computationally challenging: has to consider many combinations of sub-conditions Biclustering: SAMBA Statistical Algorithmic Method for Bicluster Analysis A. Tanay, R. Sharan, R. Shamir RECOMB 02

Biclustering Visualization

Expression Data – Input File probes conditions

ID Conversion File

Normalization: Box plots Log (Intensity) Median intensity Upper quartile Lower quartile

Standardization of Expression Levels After standardization Before standardization

Cluster Analysis: Visualization (I)

BeforeAfter Cluster I Cluster II Cluster III Cluster Analysis - Visualization (II)