Download presentation
Published byElmer Houston Modified over 9 years ago
1
Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data
Parisa Shooshtari School of Computing Science, Simon Fraser University, Burnaby Brinkman’s Lab, Terry Fox Laboratory, BC Cancer Agency, Vancouver
2
Outline: Flow Cytometry (FCM) Data Clustering of FCM data
Spectral Clustering Faithful Sampling for Spectral Clustering Result Summary
3
Basics of Flow Cytometry Technique
Sample Int-1 MHC-II MHC-II Intensity MHC-II CD-11c Wave Length CD-11c Intensity Int-2 MHC-II Int-2 Int-1 MHC-II CD-11c Wave Length
4
Cell Population Identification in Flow Cytometry (FCM)
Parameter 2 Parameter 1 X% Parameter 3 Parameter 4 Now think that this cell is just one of thousands of cells flowing pass through a tube one cell at a time. These cells can be differentiated using the fluorescence intensity indicating, for example, presence or absence of a particular cell surface protein. CLICK Here each dot represent individual cell. Axes indicate intensity at different wavelengths. A gate can then be drawn to select a particular subset of cell population with common intensities. Further sub-setting can be done based on 1-D and 2-D projections of data Adapted from the Science Creative Quarterly (2)
5
Importance of FCM Data Clustering
Manual Gating is Subjective Error-prone Time-Consuming It ignores the multi-variation nature of the data Analyzing large size FCM data sets (with up to 19 dimensions and 1000,000 points) is impractical without the aim of automated techniques
6
Which Clustering Algorithm Is Suitable?
Model-Based algorithms like FlowClust, FlowMerge and FLAME are not suitable for non-elliptical shape clusters. A Good Clustering FlowMerge GFP
7
Our Motivation for Using Spectral Clustering
Spectral clustering does not require any priori assumption on cluster size, shape or distribution It is not sensitive to outliers, noise and shape of clusters
8
Spectral Clustering in One Slide
Represent data sets by a similarity graph Construct the Graph: Vertices: data points p1, p2, …, pn Weights of edges: similarity values Si, j as Clustering: Find a cut through the graph Define a cut objective function Solve it
9
The Bottleneck of Spectral Clustering
Serious empirical barriers when applying this algorithm to large datasets Time complexity: O(n3) ---- > 2 years for 300,000 data points (cells) Required memory: O(n2) ---- > 5 terabytes for 300,000 data points (cells)
10
Faithful Sampling: Our Solution for Applying Spectral Clustering to Large Data
Uniform Sampling: Low density populations close to dense ones may not remain distinguishable Faithful Sampling: Tends to choose more samples from non-dense parts of the data.
11
How Does Our Faithful Sampling Preserve Information?
Space Uniform Sampling: It preserves low-density parts of the data by selecting more samples from them compared to the uniform sampling. Keeping the list of points in neighbourhood of samples: This will be used to define similarities between communities.
12
Clustering Result Low density populations surrounded by dense ones
13
Clustering Result Populations with Non-elliptical Shapes
Subpopulations of a major population SamSPECTRAL flowMerge FLAME
14
Dependency of SamSPECTRAL Results to Scaling Factor (σ)
Monocytes Dendritic Cells σ = 100 σ = 200 B Cells σ = 300 σ = 400
15
Block Diagram of Clustering Ensemble Method
σ1 σ2 σr SamSPECTRAL SamSPECTRAL SamSPECTRAL Build New Feature Vectors Compute Similarities Between Categorical Feature Vectors SamSPECTRAL for Categorical Data Final Results
16
Results After Applying Clustering Ensemble Method
CD14 MHC-II Final Result after Applying Clustering Ensemble Method Manual Gating Monocytes Monocytes CD14 B Cells B Cells Dendritic Cells Dendritic Cells MHC-II
17
Advantages of Using Clustering Ensemble Method
No need for manual setting of initial parameters Higher quality and stability of clustering results F-measure between manual gating and original SamSPECTRAL is in average 0.77 (sd=0.07) F-measure between manual gating and our clustering ensemble method is 0.91
18
Summary Spectral clustering can now be applied to large size data by our proposed Faithful (Information Preserving) sampling. This sampling method can be used in combination with other graph-based clustering algorithms with different objective functions to reduce size of the data. We have shown that SamSPECTRAL has advantage over model-based clusterings in identification of Cell populations with non-elliptical shapes Low-density populations surrounded by dense ones Sub-populations of a major population
19
Acknowledgement Committee: Co-authors on SamSPECTRAL Data Providers
Dr. Arvind Gupta Dr. Ryan Brinkman Dr. Tobias Kollman Co-authors on SamSPECTRAL Habil Zare Data Providers Connie Eaves Peter Landsdrop Keith Humphries
20
Thanks for Your Attention!
21
Cell Population Identification in Flow Cytometry (FCM)
Parameter 2 Parameter 1 X% Parameter 3 Parameter 4 Now think that this cell is just one of thousands of cells flowing pass through a tube one cell at a time. These cells can be differentiated using the fluorescence intensity indicating, for example, presence or absence of a particular cell surface protein. CLICK Here each dot represent individual cell. Axes indicate intensity at different wavelengths. A gate can then be drawn to select a particular subset of cell population with common intensities. Further sub-setting can be done based on 1-D and 2-D projections of data Adapted from the Science Creative Quarterly (2)
22
SamSPECTRAL Algorithm
23
SamSPECTRAL Algorithm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.