Suppose your data points live in Rn.

Slides:



Advertisements
Similar presentations
Hierarchical Clustering, DBSCAN The EM Algorithm
Advertisements

PARTITIONAL CLUSTERING
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Essential Statistics in Biology: Getting the Numbers Right Raphael Gottardo Clinical Research Institute of Montreal (IRCM)
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Finding generators for H1.
Web Sites for my ET710 Class.  This is the homepage for my portfolio. This page is where you will find the links to all of my projects.
Simplifying Variable Expressions 1. What are the coefficients of 6 + 2s s + 8g?
Mesh Coarsening zhenyu shu Mesh Coarsening Large meshes are commonly used in numerous application area Modern range scanning devices are used.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Vector Quantization CAP5015 Fall 2005.
1 Limma homework Is it possible that some of these gene expression changes are miscalled (i.e. biologically significant but insignificant p value and vice.
One Function of Two Random Variables
Change to scientific notation: A. B. C. 289,800, x x x
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Sept 25, 2013: Applicable Triangulations.
Computational Biology
Unsupervised Learning
7.1 Seeking Correlation LEARNING GOAL
Hierarchical clustering
Data Mining: Basic Cluster Analysis
More on Clustering in COSC 4335
Date of download: 10/13/2017 Copyright © ASME. All rights reserved.
We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological.
Sept 23, 2013: Image data Application.
Hallett, et al., - Supplementary Figure 1
plosone. org/article/info%3Adoi%2F %2Fjournal. pone
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
Microarray Clustering
5.3. Mapper on 3D Shape Database
Identification of combination treatment–responsive dysfunctional tumor-infiltrating CD8+ T cell population. Identification of combination treatment–responsive.
קורס פיננסי – מושגים פיננסיים / כלכליים
6. Introduction to nonparametric clustering
Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing  Graham Heimberg, Rajat.
Cluster Analysis of Microarray Data
You may choose to use any one of these provide slide templates and make changes as desired.
You may choose to use any one of these provide slide templates and make changes as desired. 1.
Cluster Analysis in Bioinformatics
High-throughput analysis of informative CpG sites for the four studied tumor suppressor genes (A-D). High-throughput analysis of informative CpG sites.
You may choose to use any one of these provide slide templates and make changes as desired.
You may choose to use any one of these provide slide templates and make changes as desired. 1.
GPX: Interactive Exploration of Time-series Microarray Data
Identification of miR‐499 targets
Linkage Genes that are physically located on the same chromosome are said to be “linked”. Linked genes are said to be “mapped” to the same chromosome.
OpenRBC: A Fast Simulator of Red Blood Cells at Protein Resolution
Volume 5, Issue 4, Pages e4 (October 2017)
You may choose to use any one of these provide slide templates and make changes as desired. 1.
Pearson correlation of gene expression identifies distinct groups of male- and female-enriched genes. Pearson correlation of gene expression identifies.
Global Proteome Analysis of the NCI-60 Cell Line Panel
Global Proteome Analysis of the NCI-60 Cell Line Panel
Design and optimization of the computational model.
Clustering The process of grouping samples so that the samples are similar within each group.
Functional genomics: Learning to think about gene expression data
Volume 7, Issue 2, Pages (August 2010)
Ten Frames Tallies Base-ten Blocks.
The first two principal components for the islet gene expression data for the 181 microarray probes that map to the chromosome 6 trans-eQTL hotspot with.
RNA expression data. RNA expression data. Heat maps comparing the normalized log2 of the ratios of Fur−, F90, and F1 signals on the cl20 control signal.
Clustering.
Whole-genome microarray analysis of gene expression in the livers of control mice and STAM mice subjected to NASH-derived hepatocarcinogenesis. Whole-genome.
Fig. 1. Microarray analyses of genes whose expression is regulated by innervation during synaptogenesis.(A) Schematic drawings of the experimental design.
M-Wnt and E-Wnt cells cluster tightly with claudin-low and basal-like breast tumors, respectively, by microarray analysis. M-Wnt and E-Wnt cells cluster.
Microarray analysis of global gene expression profiles of PC-3 and derivative cell lines. Microarray analysis of global gene expression profiles of PC-3.
Schematic diagram illustrating the proposed mechanism by which Axin may interfere with normal mammary gland development in DDTg mice. Schematic diagram.
Coexpression of other immune genes with ImSig core signatures.
Pancreatic adenocarcinoma, chronic pancreatitis, and normal pancreas samples can be distinguished on the basis of gene expression profiling. Pancreatic.
Choosing a subject for your positive – negative space painting.
Unsupervised Learning
Standard Protocol Items: Recommendations for Interventional Trials diagram. Standard Protocol Items: Recommendations for Interventional Trials diagram.
Presentation transcript:

Suppose your data points live in Rn. Voronoi diagram: Suppose your data points live in Rn. Choose data point v. The Voronoi cell associated with v is H(v,w) U w ≠ v data points. In this very simplified case my data points lie in a two-dimensional plane. Normally data points are high dimensional. For example, I may be comparing the expression or thousands of genes in tumor cells to healthy cells using microarray data. OR I might be comparing politicians voting records. Or I might be comparing the stats of basketball players. These three applications were all, by the way, published by Lum et al this past February in Nature’s Scientific Reports. I have included a link to their paper on my youtube site. http://www.nature.com/srep/2013/130207/srep01236/full/srep01236.html The Voronoi cell associated with v is Cv= { x in Rn : d(x, v) ≤ d(x, w) for all w ≠ v }

Re-partition data set into 3 voronoi cells corresponding to the 3 centroids Figure modified from https://en.wikipedia.org/wiki/File:K_Means_Example_Step_1.svg

Applications of k-means clustering: 1.) Group like items together 2.) Reduce the size of your data set. 3.) Color quantization. (a) Reduce memory use. (b) Certain devices might have limited number of colors for display.

library("TDA") circle = circleUnif(300, r = 1) plot(circle, asp = 1) cl <- kmeans(circle, 10) plot(circle,col=cl$cluster) points(cl$centers, pch=8, cex = 2) plot(cl$centers, asp = 1) Rstudio:

For k-means, There are many methods for determining k k = number of desired clusters

Figures courtesy of Wako Bungula

Elbow method http://sites.northwestern.edu/msia/2016/12/ 08/k-means-shouldnt-be-our-only-choice/

Figures courtesy of Wako Bungula

Figures courtesy of Wako Bungula

Figures courtesy of Wako Bungula

If you care about closeness: Complete linkage Average linkage Ward’s linkage K-means If you care about connected components: Single linkage DBscan Figures courtesy of Wako Bungula

Dbscan (Density-based spatial clustering of applications with noise ) Figures courtesy of Wako Bungula

Figures courtesy of Wako Bungula

Figures courtesy of Wako Bungula

Figures courtesy of Wako Bungula

http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html