Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

BioInformatics (3).
Basic Gene Expression Data Analysis--Clustering
Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Cluster analysis for microarray data Anja von Heydebreck.
Machine Learning and Data Mining Clustering
Unsupervised learning: Clustering Ata Kaban The University of Birmingham
Clustering II.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Microarray analysis Golan Yona ( original version by David Lin )
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
4. Ad-hoc I: Hierarchical clustering
Clustering Specific Issues related to Project 2. Reducing dimensionality –Lowering the number of dimensions makes the problem more manageable Less memory.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
Alizadeh et. al. (2000) Stephen Ayers 12/2/01. Clustering “Clustering is finding a natural grouping in a set of data, so that samples within a cluster.
Clustering (Gene Expression Data) 6.095/ Computational Biology: Genomes, Networks, Evolution LectureOctober 4, 2005.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Introduction to Bioinformatics - Tutorial no. 12
Fuzzy K means.
Neural Network Homework Report: Clustering of the Self-Organizing Map Professor : Hahn-Ming Lee Student : Hsin-Chung Chen M IEEE TRANSACTIONS ON.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Clustering Unsupervised learning Generating “classes”
Data mining and machine learning A brief introduction.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
Microarrays.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
es/by-sa/2.0/. Principal Component Analysis & Clustering Prof:Rui Alves Dept Ciencies Mediques.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Microarray Data Analysis (Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct 13, 2005 ChengXiang Zhai Department of Computer Science University of.
CS654: Digital Image Analysis
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Clustering Gene Expression Data BMI/CS 576 Colin Dewey Fall 2010.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prepared by: Mahmoud Rafeek Al-Farra
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Flat clustering approaches
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining and Text Mining. The Standard Data Mining process.
Clustering Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Machine Learning Clustering: K-means Supervised Learning
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
John Nicholas Owen Sarah Smith
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Hierarchical Clustering
Unsupervised Learning: Clustering
Presentation transcript:

Microarray analysis 2 Golan Yona

2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i: (x1,x2,x3,…) This is an expression profile of a gene x1,x2,x3,.. are usually in log scale

Example: Yeast cell-cycle data Time Expression at time t Background Expression (control)

Each gene has its own profile Gene X Gene Y

Genes can have similar expression profiles..

..because  They can be part of the same complex as interacting proteins (strong constraint)  They can be part of the same pathway without interacting directly  They can have similar regulatory elements (not necessarily functionally related)  They can have similar regulatory elements and similar sequences -> similar functions (fail-safe mechanisms through redundancy by gene duplication -> robustness, immunity, concurrency)

GLYCOLYSIS: Energy production when oxygen is not available

But similarity is not obvious

..yet strong patterns emerge Cluster of 88 proteins involved in DNA replication and repair, mitosis and meiosis Cluster of 141 proteins involved in translation initiation, ribosomal proteins, tRNA synthetases

How to measure similarity? Given two expression vectors U,V of dimension d (number of features) = 1/d  i v i  v = 1/d  i (v i - ) 2

Problem with missing values

Possible solutions Averaging: replace every missing feature with the average over all genes. Nearest-neighbor approach (better): use only similar genes to define the values for the missing features (with weights) EM approach (even better)– iteratively define classes based on the current approximation, and re-estimate the missing features based only on genes in these groups. Repeat until convergence. Or use just the available data – less noise but have to normalize properly

How to measure significance of expression similarity Effective solution (see paper in the website): permute the vectors multiple times and compute the distance between the permuted vectors to generate a background distribution. Use that distribution to compute the zscore of the original distance value.

Correlation and anti- correlation

How to detect patterns in expression arrays Clustering Dimensionality reduction Statistical and graphical models (e.g. Bayesian networks)

Clustering Unsupervised learning technique for data exploration and pattern discovery “Clustering is finding a natural grouping in a set of data, so that samples within a cluster will be more similar to each other than they are to samples in other clusters.” There are many clustering algorithms. We will focus on two. The goal: finding groups of correlated genes (“signature groups”) and extract features of groups. Clustering can also be done for experiments

Hierarchical clustering Clusters are composed of subclusters that are composed of subclusters that are composed of subclusters..  The bottom level - each point (gene) is a cluster (n clusters).  Next level - two clusters are merged (n-1 clusters)  Next level - two clusters (from the previous level) are merged (n-2 clusters)  …  Top level – all n points are in one cluster Usually represented in dendogram Two types: divisive and agglomerative

Divisive Top-down Start with all samples in one cluster and successively split into separate clusters

Agglomerative Bottom-up approach Less computationally intensive Start with n singletons and successively merge clusters Outline: Place each sample (gene) in a separate cluster Repeat: Merge the two most similar clusters into one cluster Until all samples are in one cluster

Types of agglomerative clustering The types differ in the way we measure similarity/distance between clusters Single linkage – the distance between two clusters is the minimal distance between points in these clusters Maximal linkage - the distance between two clusters is the maximal distance between points in these clusters Average linkage - the distance between two clusters is the average distance between points in these clusters

Figure 1

Figure 2:

Figure 3:

Partitional clustering: k-means Another popular clustering algorithm Set the number of clusters k. Select centroids for these clusters (at random, or based on PCA) Repeat: - Classify each sample to the closest class (closest centroid) - Recompute the centroid of each class as the average of the samples classified to that class Until means (centroids) converge

Open issues Unstable properties of clustering algorithms Interpretation of results (there are other factors)