Download presentation
Published byGerard Neal Modified over 8 years ago
1
Flow cytometry data analysis: SPADE for cell population identification and sample clustering
Narahara
2
Flow cytometry (FCM) data
Signals of multiple cell-surface markers are measured for each cell Single-cell measurement Multi-dimensional Up to 12 in standard flow cytometry >30 in next-generation mass cytometry Analysis 1: Cell population identification We want to identify a particular cell population (e.g. CD8+ T-cells). Analysis 2: Sample clustering We want to predict a phenotype from FCM pattern. We want to measure similarity between two different samples.
3
spanning-tree progression analysis of density-normalized events (SPADE)
Nature Biotechnology (2011) First described for cell population identification (Qui et al. Nature Biotechnology, 2011) Unsupervised approach to identify either known or unexpected cell types. No need of prior knowledge Effective 2D visualization of multi-dimensional FCM data in tree structure Extension for sample classification (Aghaeepour et al., Nature Methods, 2013) Meta-SPADE tree and Earth Mover's Distance (EMD)
4
SPADE for cell population identification
For each sample SPADE for cell population identification
5
Traditional methods Manual gating Automated gating methods
subject to user’s knowledge unsuitable for high-throughput data analysis Automated gating methods clustering-based often miss continuity (progression of cellular differentiation) and a population of rare cell types Gating
6
SPADE outline FCM data & manual gating Simulated 2-marker FCM data Equal density rare cell types contributes to clustering equally to abundant types Minimum spanning tree Clustering connects all clusters. map all cells to clusters
7
Three or more markers Output is in 2D tree structure
8
1. Down-sampling Cells form a high-dimensional point cloud
#points = #cells #dimension = #markers Many other clustering methods tend to capture the most abundant cell populations, whereas rare cell types are either excluded as outliers or absorbed by larger clusters. Equalizing density of cloud increases chance to identify rare cell types. LDi: local density for cell i (#cells within its neighbor) L1 distance (Manhattan distance) between cells User-defined parameters OD: outlier density (such as 1st percentile of all LDs) cells with local density lower than OD are discarded as noise. TD: target density (such as 5th percentile of all LDs)
9
2. Clustering Agglomerative (“bottom-up”) method
Each cell forms its own cluster Iteratively merge with the nearest cluster Single linkage L1 distance Repeat iterative grouping until the number of clusters reduced to the user-defined target number (such as 50). #clusters is not the expected number of cell types you want to differentiate. #clusters defines how much you want to simplify the point cloud. note: single linkage minimum distance between two points in two different clusters note: #clusters 50 for 8 markers 300 for 13 markers
10
3. Minimum spanning tree Construction of MST
MST is a tree that links nodes with the minimal total length of edges. Each cell cluster = node the median marker expression represents the cluster Edges are weighted by the distance between nodes SPADE uses Boruvka’s algorithm
11
4. Up-sampling Mapping each cell to one cluster (node)
assign each cell (cell A) to its nearest down-sampled cell (cell B) assign cell A to the cluster that cell B belongs to.
12
5. visualization & identification of cell types
SPADE visualizes the resulting tree in 2D structure a modified Fruchterman-Reingold algorithm for layout Coloring nodes based on intensity for each marker One colored tree per marker Identification of cell types is manual.
13
SPADE for sample clustering
For comparing multiple data sets SPADE for sample clustering
14
Procedure Down-sampling separately for each sample
adjust TD such that each data set contribute the same number of cells Pool the down-sampled data into a meta-down-sampled data set, which shapes a meta-cloud Clustering and MST construction as described for single-sample SPADE. Feature extraction for each data set For each data set, calculate the percentage of cells in each cluster.
15
Classifying samples PCA using the cellular distribution
Distance (dissimilarity) between a pair of samples cellular distribution + tree structure Earth Mover’s Distance (EMD)
16
Earth mover’s distance
Measure of the distance between two probability distribution over a region Intuitively, EMD is a minimal work (cost) to transform a mass of earth spread over a region to another shape
17
Edge weighted by distance
EMD for clusters Transportation problem Node Edge weighted by distance
18
Software Matlab R/Bioconductor
R/Bioconductor
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.