Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis Gabriel Eichler Boston University Some slides adapted from: MeV documentation.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

PARTITIONAL CLUSTERING
Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.
Cluster analysis for microarray data Anja von Heydebreck.
Introduction to Bioinformatics
Agenda 1.Introduction to clustering 1.Dissimilarity measure 2.Preprocessing 2.Clustering method 1.Hierarchical clustering 2.K-means and K-memoids 3.Self-organizing.
X0 xn w0 wn o Threshold units SOM.
Chapter 1 Introduction to Clustering. Section 1.1 Introduction.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Exploring gene pathway interactions using SOM Keala Chan SoCalBSI August 20, 2004.
Mutual Information Mathematical Biology Seminar
SocalBSI 2008: Clustering Microarray Datasets Sagar Damle, Ph.D. Candidate, Caltech  Distance Metrics: Measuring similarity using the Euclidean and Correlation.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Microarray Data Preprocessing and Clustering Analysis
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Interactive Exploration of Hierarchical Clustering Results HCE (Hierarchical Clustering Explorer) Jinwook Seo and Ben Shneiderman Human-Computer Interaction.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Project Phase I l Due on 9/22, send me through l 2-10 Pages l Free style in writing (use 11pt font or larger) l Project description å Overview å.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Introduction to Bioinformatics - Tutorial no. 12
Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Clustering and MDS Exploratory Data Analysis. Outline What may be hoped for by clustering What may be hoped for by clustering Representing differences.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Clustering Unsupervised learning Generating “classes”
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Advanced Statistical Methods for Research Math 736/836
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Programming Collective Intelligence by Toby.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Self-organizing Maps Kevin Pang. Goal Research SOMs Research SOMs Create an introductory tutorial on the algorithm Create an introductory tutorial on.
Anindya Bhattacharya and Rajat K. De Bioinformatics, 2008.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Using geWorkbench: Hierarchical & SOM Clustering Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of.
Microarrays.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Multidimensional Scaling Vuokko Vuori Based on: Data Exploration Using Self-Organizing Maps, Samuel Kaski, Ph.D. Thesis, 1997 Multivariate Statistical.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Hierarchical Clustering of Gene Expression Data Author : Feng Luo, Kun Tang Latifur Khan Graduate : Chien-Ming Hsiao.
Clustering.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
1 Effect of Spatial Locality on An Evolutionary Algorithm for Multimodal Optimization EvoNum 2010 Ka-Chun Wong, Kwong-Sak Leung, and Man-Hon Wong Department.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Biclustering of Expression Data by Yizong Cheng and Geoge M. Church Presented by Bojun Yan March 25, 2004.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
C LUSTERING José Miguel Caravalho. CLUSTER ANALYSIS OR CLUSTERING IS THE TASK OF ASSIGNING A SET OF OBJECTS INTO GROUPS ( CALLED CLUSTERS ) SO THAT THE.
Computational Biology
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Concept Map: Clustering Visualizations of Categorical Domains
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Revision (Part II) Ke Chen
Revision (Part II) Ke Chen
Dimension reduction : PCA and Clustering
(A) Hierarchical clustering was performed to identify groups of patients with similar RNASeq expression of 20 genes associated with reduced survivability.
Clustering The process of grouping samples so that the samples are similar within each group.
Presentation transcript:

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis Gabriel Eichler Boston University Some slides adapted from: MeV documentation slides

Why Cluster? Clustering is a process by which you can explore your data in an efficient manner. Visualization of data can help you review the data quality. Assumption: Guilt by association – similar gene expression patterns may indicate a biological relationship.

Expression Vectors Gene Expression Vectors encapsulate the expression of a gene over a set of experimental conditions or sample types Line Graph -2 2 Numeric Vector Heatmap

Expression Vectors As Points in ‘Expression Space’ Experiment 1 Experiment 2 Experiment 3 Similar Expression t 1t 2t 3 G1 G2 G3 G4 G

Distance and Similarity -the ability to calculate a distance (or similarity, it’s inverse) between two expression vectors is fundamental to clustering algorithms -distance between vectors is the basis upon which decisions are made when grouping similar patterns of expression -selection of a distance metric defines the concept of distance

Distance: a measure of similarity between gene expression. Exp 1Exp 2Exp 3Exp 4Exp 5Exp 6 Gene A Gene B x 1A x 2A x 3A x 4A x 5A x 6A x 1B x 2B x 3B x 4B x 5B x 6B Some distances: (MeV provides 11 metrics) 1.Euclidean:  i = 1 (x iA - x iB ) Manhattan:  i = 1 |x iA – x iB | 6 3. Pearson correlation p0p0 p1p1

Clustering Algorithms

Be weary - confounding computational artifacts are associated with all clustering algorithms. -You should always understand the basic concepts behind an algorithm before using it. Anything will cluster! Garbage In means Garbage Out.

Hierarchical Clustering (HCL-1) IDEA: Iteratively combines genes into groups based on similar patterns of observed expression By combining genes with genes OR genes with groups algorithm produces a dendrogram of the hierarchy of relationships. Display the data as a heatmap and dendrogram Cluster genes, samples or both

Hierarchical Clustering Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8

Hierarchical Clustering Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8

Hierarchical Clustering Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8

Hierarchical Clustering Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8

Hierarchical Clustering Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8

Hierarchical Clustering Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8

Hierarchical Clustering Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8

Hierarchical Clustering Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 Gene 8

Hierarchical Clustering HL

The Leaf Ordering Problem: Find ‘optimal’ layout of branches for a given dendrogram architecture 2 N-1 possible orderings of the branches For a small microarray dataset of 500 genes there are 1.6*E150 branch configurations Samples Genes

Hierarchical Clustering The Leaf Ordering Problem:

Hierarchical Clustering Pros: –Commonly used algorithm –Simple and quick to calculate Cons: –Real genes probably do not have a hierarchical organization

Self-Organizing Maps (SOMs) ad b c Idea: Place genes onto a grid so that genes with similar patterns of expression are placed on nearby squares. A D B C

Self-Organizing Maps (SOMs) ad b c IDEA: Place genes onto a grid so that genes with similar patterns of expression are placed on nearby squares. A D B C

Self-organizing Maps (SOMs)

Self-organizing Maps (SOMS)

G en e s The Gene Expression Dynamics Inspector – GEDI Group A Group B Group C Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 … Group A A1A2A3A4 B1 B2 B3B4 C1C2 Group B Group C C3C4 }}} S a m p l e s G en e s 1234 H L Group AGroup B Group C GEDI’s Features: Allows for simultaneous analysis or several time courses or datasets Displays the data in an intuitive and comparable mathematically driven visualization The same genes maps to the same tiles

Software Demonstrations MeV available at GEDI available at

Comparison of GEDI vs. Hierarchical Clustering Hierarchical clustering of random data (GIGO) From: CreateGEP_Journal.wpd, random_A G.E.D.I. allows the direct visual assessment of the quality of conventional cluster analysis

Questions