MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia Armstrong et al, Nature Genetics 30, 41-47 (2002)

Slides:



Advertisements
Similar presentations
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Basic Gene Expression Data Analysis--Clustering
Cluster Analysis: Basic Concepts and Algorithms
Supervised and unsupervised analysis of gene expression data Bing Zhang Department of Biomedical Informatics Vanderbilt University
Cluster analysis for microarray data Anja von Heydebreck.
BASIC METHODOLOGIES OF ANALYSIS: SUPERVISED ANALYSIS: HYPOTHESIS TESTING USING CLINICAL INFORMATION (MLL VS NO TRANS.) IDENTIFY DIFFERENTIATING GENES Basic.
UNSUPERVISED ANALYSIS GOAL A: FIND GROUPS OF GENES THAT HAVE CORRELATED EXPRESSION PROFILES. THESE GENES ARE BELIEVED TO BELONG TO THE SAME BIOLOGICAL.
The Broad Institute of MIT and Harvard Clustering.
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Clustering (Part II) 10/07/09. Outline Affinity propagation Quality evaluation.
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
Introduction to Microarry Data Analysis - II BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Introduction to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005.
Introduction to Bioinformatics - Tutorial no. 12
 Goal A: Find groups of genes that have correlated expression profiles. These genes are believed to belong to the same biological process and/or are co-regulated.
Cluster Analysis for Gene Expression Data Ka Yee Yeung Center for Expression Arrays Department of Microbiology.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Advanced Methods of Data Analysis 9: :00CTWC 10: :00 CTWC exercise 11:00 – 11:30 Break 11: :00 SPIN 12: :00 SPIN exercise Course.
MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia Armstrong et al, Nature Genetics 30, (2002)
Georg Gerber Lecture #6, 2/6/02
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
Elizabeth Garrett-Mayer November 5, 2003 Oncology Biostatistics
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Exagen Diagnostics, Inc., all rights reserved Biomarker Discovery in Genomic Data with Partial Clinical Annotation Cole Harris, Noushin Ghaffari.
Mar 2002 (GG)1 Clustering Gene Expression Data Gene Expression Data Clustering of Genes and Conditions Methods –Agglomerative Hierarchical: Average Linkage.
Lecture 11. Microarray and RNA-seq II
Microarrays.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
More About Clustering Naomi Altman Nov '06. Assessing Clusters Some things we might like to do: 1.Understand the within cluster similarity and between.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 4 Clustering Algorithms Bioinformatics Data Analysis and Tools
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Machine Learning Queens College Lecture 7: Clustering.
Brad Windle, Ph.D Unsupervised Learning and Microarrays Web Site: Link to Courses and.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Clustering Patrice Koehl Department of Biological Sciences National University of Singapore
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)
1 Limma homework Is it possible that some of these gene expression changes are miscalled (i.e. biologically significant but insignificant p value and vice.
1 baySeq homework HS analysis: Out of 7388 genes with data, 1995 genes were DE at FDR
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
C LUSTERING José Miguel Caravalho. CLUSTER ANALYSIS OR CLUSTERING IS THE TASK OF ASSIGNING A SET OF OBJECTS INTO GROUPS ( CALLED CLUSTERS ) SO THAT THE.
Clustering (1) Clustering Similarity measure Hierarchical clustering
Cluster Analysis of Gene Expression Profiles
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Clustering Patrice Koehl Department of Biological Sciences
Machine Learning Clustering: K-means Supervised Learning
Cluster Analysis in Bioinformatics
Dimension reduction : PCA and Clustering
(A) Hierarchical clustering was performed to identify groups of patients with similar RNASeq expression of 20 genes associated with reduced survivability.
Hierarchical Clustering
Presentation transcript:

MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia Armstrong et al, Nature Genetics 30, (2002)

Blank slide/colon data

gene Hsa ' UTR 2a MYOSIN HEAVY CHAIN, NONMUSCLE (Gallus gallus) tumor: normal: mean = 0.73 std = 0.4 mean = 2.41 std = 1.05

histograms HISTOGRAM, BINS OF 0.5

NORMALIZED (FREQUENCIES) mean = 0.73 std = 0.4mean = 2.41 std = 1.05

t-test T = P = 10 e-14

gene Hsa ' UTR 2a EUKARYOTIC INITIATION FACTOR 4B (Homo sapiens) mean = std = mean = std = tumor: normal:

histograms

NORMALIZED (FREQUENCIES)

t-test T = P = %

gene2000 Hsa.1829 gene 1 Human mRNA fragment for class II histocompatibility antigen beta-chain (pII-beta-4) tumor: normal: mean = std = mean = std = 1.536

histograms

NORMALIZED (FREQUENCIES)

t-test T = P =

E, C&N_log2E colon date expression matrix E log2 E, center, normalize

genes ordered by p-value 726 genes with p < 0.05 ordered by difference of means (normal – tumor)

after ttest 0.05 order by diffmeans genes with p < 0.05 RANDOM DATA

sorted p Q=0.15 I=758

how many out of 726 are false? 0.14 FDR: 726*0.14=101 false separating genes

how many genes at FDR=0.05? 516*0.05=26 false separating genes

26 out of false 26 - false

random data

100separating (p<0.001), 1900 random

MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia Armstrong et al, Nature Genetics 30, (2002)

separation E1E1 E2E2 ALL MLL E 1 -2E 2 = 0 = E 1 - 2E 2 < 0= E 1 - 2E 2 > 0

projection 1 E1E1 E2E2 ALL MLL w +/- PROJECTIONS ON w – DO SEPARATE ALL FROM MLL

projection 2 E1E1 E2E2 ALL MLL +/- PROJECTIONS ON w – DO NOT SEPARATE ALL FROM MLL

projection 3 E1E1 E2E2 WELL SEPARATED CENTERS OF MASS - NO SEPARATION OF THE TWO CLOUDS

projection 4 E1E1 E2E2 WEAK SEPARATION OF CENTERS OF MASS – GOOD SEPARATION OF THE TWO CLOUDS

Fisher to perceptron E1E1 E2E2 ALL MLL OPTIMAL LINE TO PROJECT ON FISHER PERCEPTRON

UNSUPERVISED ANALYSIS GOAL A: FIND GROUPS OF GENES THAT HAVE CORRELATED EXPRESSION PROFILES. THESE GENES ARE BELIEVED TO BELONG TO THE SAME BIOLOGICAL PROCESS. GOAL B: DIVIDE TISSUES TO GROUPS WITH SIMILAR GENE EXPRESSION PROFILES. THESE TISSUES ARE EXPECTED TO BE IN THE SAME BIOLOGICAL (CLINICAL) STATE. CLUSTERING Unsupervised analysis

Giraffe DEFINITION OF THE CLUSTERING PROBLEM

CLUSTER ANALYSIS YIELDS DENDROGRAM Dendrogram1 T (RESOLUTION)

Giraffe + Okapi BUT WHAT ABOUT THE OKAPI?

STATEMENT OF THE PROBLEM GIVEN DATA POINTS X i, i=1,2,...N, EMBEDDED IN D - DIMENSIONAL SPACE, IDENTIFY THE UNDERLYING STRUCTURE OF THE DATA. AIMS:PARTITION THE DATA INTO M CLUSTERS, POINTS OF SAME CLUSTER - "MORE SIMILAR“ M ALSO TO BE DETERMINED! GENERATE DENDROGRAM, IDENTIFY SIGNIFICANT, “STABLE” CLUSTERS "ILL POSED": WHAT IS "MORE SIMILAR"? RESOLUTION Statement of the problem2

CLUSTER ANALYSIS YIELDS DENDROGRAM Dendrogram2 T LINEAR ORDERING OF DATA YOUNG OLD

AGGLOMERATIVE HIERARCHICAL –AVERAGE LINKAGE (GENES: EISEN ET. AL., PNAS 1998) CENTROID (REPRESENTATIVE) –SELF ORGANIZED MAPS (KOHONEN 1997; (GENES: GOLUB ET. AL., SCIENCE 1999) --K-MEANS (GENES; TAMAYO ET. AL., PNAS 1999) PHYSICALLY MOTIVATED –DETERMINISTIC ANNEALING (ROSE ET. AL.,PRL 1990; GENES: ALON ET. AL., PNAS 1999) –SUPER-PARAMAGNETIC CLUSTERING (SPC)(BLATT ET.AL. GENES: GETZ ET. AL., PHYSICA 2000,PNAS 2000) CLUSTERING METHODS Clustering methods

Agglomerative Hierarchical Clustering Distance between joined clusters Need to define the distance between the new cluster and the other clusters. Single Linkage: distance between closest pair. Complete Linkage: distance between farthest pair. Average Linkage: average distance between all pairs or distance between cluster centers Need to define the distance between the new cluster and the other clusters. Single Linkage: distance between closest pair. Complete Linkage: distance between farthest pair. Average Linkage: average distance between all pairs or distance between cluster centers Dendrogram The dendrogram induces a linear ordering of the data points

Hierarchical Clustering - Summary Results depend on distance update method Greedy iterative process NOT robust against noise No inherent measure to identify stable clusters

2 good clouds COMPACT WELL SEPARATED CLOUDS – EVERYTHING WORKS

2 flat clouds 2 FLAT CLOUDS - SINGLE LINKAGE WORKS

filament SINGLE LINKAGE SENSITIVE TO NOISE

Average linkage Distance between joined clusters Need to define the distance between the new cluster and the other clusters. Average Linkage: average distance between all pairs Need to define the distance between the new cluster and the other clusters. Average Linkage: average distance between all pairs Dendrogram

Agglomerative Hierarchical Clustering Distance between joined clusters Need to define the distance between the new cluster and the other clusters. Single Linkage: distance between closest pair. Complete Linkage: distance between farthest pair. Average Linkage: average distance between all pairs or distance between cluster centers Need to define the distance between the new cluster and the other clusters. Single Linkage: distance between closest pair. Complete Linkage: distance between farthest pair. Average Linkage: average distance between all pairs or distance between cluster centers Dendrogram The dendrogram induces a linear ordering of the data points