Download presentation
1
Fuzzy K means
2
Fuzzy K means A gene can be assigned to several clusters
Each gene is assigned to a cluster with a membership value between 0 and 1 The membership values of a gene add up to one Genes with lower membership values are not well represented by the cluster centroid Expression of genes with high membership values are close to cluster centroid
3
Centroid During the centroid refinement in each clustering cycle, new centroids were calculated on the basis of the weighted mean of all the gene –expression patterns in the data set according to
4
Membership Function Each gene’s membership m (a continuous variable from 0 to 1) is defined as:
5
Fuzzy K means The gene weight is (only on the seconed and the third round) empirically defined as: Where is the Pearson Correlation between Xi and Xn and is the correlation cutoff
6
Fuzzy K means In each clustering cycle , the centroids were iteratively refined until the average change was <0.001. Around 85 % of the centroids , stabilized within approximately 15 iterations , some of centroids required more : about iterations before stabilizing.
7
Fuzzy K means After each clustering cycle , each centroid was compared to all other centroids in the set , and centroid pairs correlated >0.9 were replaced by their average .
8
Visualization Tools
9
Cells respond to environment
Various external messages Heat Biology Background Cells respond to the environment. This is clear because the same copy of the genome is located in every cell in an organism, but genes are expressed at different levels in different cells (otherwise, we’d have uniform behavior of cells, which is clearly not the case). Cells respond to different factors, including: environmental conditions, heat, food supply, and various external messages. Cells differentiate during development. Gene expression has a lot to do with what proteins are required at that particular point in the cells development or location. Responds to environmental conditions Food Supply
10
Genome is fixed – Cells are dynamic
A genome is static Every cell in our body has a copy of same genome A cell is dynamic Responds to external conditions Saccharomyces cerevisiae cells follow a cell cycle of division and also budding. Cells differentiate during development
11
Gene regulation Gene regulation is responsible for dynamic cell
Gene expression varies according to: Cell type External conditions
12
Transcription Factors Binding to DNA
Transcription regulation: Certain transcription factors bind DNA Binding recognizes DNA substrings: Regulatory motifs
13
Regulation of Genes Transcription Factor (Protein) RNA polymerase
DNA Gene Regulatory Element
14
Regulation of Genes Transcription Factor (Protein) RNA polymerase DNA
Regulatory Element Gene
15
Regulation of Genes New protein RNA polymerase Transcription Factor
DNA Regulatory Element Gene
16
The Challenges of Gene Expression Data
Many genes have expression data patterns that are similar to multiple, distinct gene groups.
17
Results of Clustering Gene Expression
CLUSTER is simple and easy to use De facto standard for microarray analysis Limitations: Hierarchical and other method clustering in general is not robust Genes may belong to more than one cluster
18
Gene can be co expressed with different gene groups in response to different conditions.
19
Saccharomyces cerevisiae
The yeast Saccharomyces cerevisiae possesses sophisticated mechanisms to choreograph the expression of its 6200 genes in order to thrive or at list to survive in a wide range of environmental conditions.
20
The gene expression of 40 Yap1p targets, these genes were coordinately induced in responds to subset of conditions shown here ( labeled in red)
21
What is a microarray
22
What is a microarray (2) A 2D array of DNA sequences from thousands of genes Each spot has many copies of same gene Allow mRNAs from a sample to hybridize Measure number of hybridizations per spot
23
Goal of Microarray Experiments
Measure level of gene expression across many different conditions: Expression Matrix M: {genes}{conditions}: Mij = |genei| in conditionj Deduce gene function Genes with similar function are expressed under similar conditions
24
Fuzzy K-Means clustering
Each gene can belong to many clusters Soft (fuzzy) assignment of genes to clusters Each gene has 1.0 membership units, allocated amongst clusters based on correlation with means Cluster means are calculated by taking the weighted average of all the genes in the cluster A gene can be assigned to several clusters Each gene is assigned to a cluster with a membership value between 0 and 1 The membership values of a gene add up to one Genes with lower membership values are not well represented by the cluster centroid Expression of genes with high membership values are close to cluster centroid
25
Fuzzy K-Means clustering
Algorithm: Use PCA to initialize cluster means 3 iterations of fuzzy k-means clustering, find k/3 clusters per iteration In each iteration, start with brand new clusters and initializations And a few more heuristic tricks
26
Initialization Use PCA to find a few eigenvectors for initialization
These features capture the directions of maximum variance Must be orthonormal
27
Example Initialization
k/3 centroids defined from k/3 first eigenvectors
28
Example First iteration of clustering
29
Iteration of the approach
Remove genes that have a Pearson Correlation with a particular cluster greater than 0.7 Intuition: These strong signal from these genes has been accounted for Repeat
30
Removing Duplicate Centroids
Centroids with Pearson correlation > 0.9 will be averaged. Allows selecting a large initial number of clusters, since duplicates will be removed
31
Repeat 3 times Output Cluster means Gene assignments to clusters
35
Regulatory systems that govern the expression of overlapping sets of genes in yeast.
36
Fuzzy K means ADVANTAGES
The method can present overlapping clusters , revealing distinct features of each gene’s function and regulation. The resulting implication can be used to assign refined hypothetical functions to uncharacterized gene products and additional cellular roles of well none studied proteins .
37
Fuzzy K means ADVANTAGES
It present more comprehensive groups of conditionally co regulate genes. It elucidate the environmental conditions that trigger changes in gene expression. It requires no a priori information about the dataset.
38
Fuzzy K means DISADVANTAGES
Assignment of genes to the cluster requires a user – defined cutoff and selecting meaningful cutoff is a challenge. Fuzzy K means failed to identify a small number of groups that were identified by hierarchical clustering.
39
My opinion The unique advantages of fuzzy K means clustering make the technique a valuable tool for gene expression analysis , it’s flexibility can be used to reveal more complex correlations between gene expression patterns, promoting refined hypotheses of the role and regulation of gene expression changes.
40
In order to get over the limitations… combining hierarchical clustering with fuzzy K means can be useful..
41
Thank you !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.