Download presentation
Presentation is loading. Please wait.
1
StatQuest!
2
StatQuest!
3
StatQuest!
4
Heatmaps… You’ve seen them before….
5
The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples.
6
The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples. This data has been modified in 2 ways so that we can gain some insights from it.
7
The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples. This data has been modified in 2 ways so that we can gain some insights from it. 1) The relative abundances have been scaled. In this case, this was done on per gene basis (other heatmaps scale all the genes at once). This makes it easy to see that sample X has more/less of gene Y than sample Z.
8
The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples. This data has been modified in 2 ways so that we can gain some insights from it. 1) The relative abundances have been scaled. In this case, this was done on per gene basis (other heatmaps scale all the genes at once). This makes it easy to see that sample X has more/less of gene Y than sample Z. It’s easy to see that Sample 1 expresses this gene more than the others.
9
The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples. This data has been modified in 2 ways so that we can gain some insights from it. 1) The relative abundances have been scaled. In this case, this was done on per gene basis (other heatmaps scale all the genes at once). This makes it easy to see that sample X has more/less of gene Y than sample Z. It’s easy to see that Sample 1 expresses this gene more than the others. However, this specific scaling means we can’t compare across genes. The dark red bar in the Sample 1 for this gene doesn’t mean that Sample 1 transcribes it more than other genes, just other samples.
10
The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples. This data has been modified in 2 ways so that we can gain some insights from it. 1) The relative abundances have been scaled. In this case, this was done on per gene basis (other heatmaps scale all the genes at once). This makes it easy to see that sample X has more/less of gene Y than sample Z. 2) The rows/genes have been grouped according to “similarity”.
11
These genes are transcribed most in the 2nd sample (and least in the 4th sample).
12
These genes are transcribed most in the 2nd sample (and least in the 4th sample).
These genes are transcribed most in the 1st sample (and least in the 4th sample).
13
These genes are transcribed most in the 2nd sample (and least in the 4th sample).
These genes are transcribed most in the 1st sample (and least in the 4th sample). These genes are transcribe most in the 2nd sample (and least in the 3rd sample).
14
The “clustering” isn’t by chance, but due to a computer program that tries to put “similar” things close together.
15
Without clustering the data would look like this…
16
Without clustering or scaling, the data would look like this!!!!
17
Without clustering or scaling, the data would look like this!!!!
Notice that one gene is highly transcribed compared to the others. It’s an outlier…
18
Another example…
19
This heatmap has been scaled and clustered.
The scaling is “global” – not per row/gene – but for all rows/genes.
20
This heatmap has been scaled and clustered.
The scaling is “global” – not per row/gene – but for all rows/genes. We can use “global” scaling because we don’t have an outlier like we did in the last dataset.
21
This heatmap has been scaled and clustered.
The scaling is “global” – not per row/gene – but for all rows/genes. The clustering is by column/sample AND by row/gene.
22
These columns/samples cluster together.
This heatmap has been scaled and clustered. The scaling is “global” – not per row/gene – but for all rows/genes. The clustering is by column/sample AND by row/gene.
23
These columns/samples cluster together.
This heatmap has been scaled and clustered. The scaling is “global” – not per row/gene – but for all rows/genes. The clustering is by column/sample AND by row/gene. These rows/genes cluster together.
24
Without clustering
25
Without clustering or scaling
26
A quick aside....
27
What if we had used global scaling with the first heatmap?
28
Now using global scaling…
The outlier skews the scale so much it is impossible to see the other genes.
29
Now using global scaling…
Also, notice that the clustering changes and the genes have a new order. The outlier skews the scale so much it is impossible to see the other genes.
30
Scaling can affect two things:
How brightly colored the genes are and whether you can compare between them. The clustering. Now using global scaling… Also, notice that the clustering changes and the genes have a new order. The outlier skews the scale so much it is impossible to see the other genes.
31
… now back to the action.
32
How to scale data… Regardless of whether you do it by gene or globally, the most common method is… nameless! I hate to coin a new term, but let’s call it “Z-Score Scaling” because, technically, it converts the data to “Z-scores”
33
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F RNA-seq read counts from 6 samples. 5 10 15 20 25
34
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F RNA-seq read counts from 6 samples. 5 10 15 20 25 Step 1) Calculate the mean (16.5)
35
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -25 -20 -15 -10 -5 5 10 15 20 25 Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value
36
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -25 -20 -15 -10 -5 5 10 15 20 25 This centers the data around 0. Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value
37
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -25 -20 -15 -10 -5 5 10 15 20 25 This centers the data around 0. Samples with relatively high transcription get positive values. Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value
38
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -25 -20 -15 -10 -5 5 10 15 20 25 This centers the data around 0. Samples with relatively high transcription get positive values. Samples with relatively low transcription get negative values. Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value
39
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -25 -20 -15 -10 -5 5 10 15 20 25 Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value Step 3) Calculate the standard deviation (6.28)
40
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value Step 3) Calculate the standard deviation (6.28) Step 4) Divide by the standard deviation (notice, the scale on the axis has changed)
41
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value Step 3) Calculate the standard deviation (6.28) Step 4) Divide by the standard deviation (notice, the scale on the axis has changed) The data used to be spread from -8 to +8. Now it is between -1.2 and 1.2
42
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value Step 3) Calculate the standard deviation (6.28) Step 4) Divide by the standard deviation (notice, the scale on the axis has changed) The formula for Z-score scaling sample value – the mean the standard deviation si - µ s a.k.a.
43
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Regardless of the variation in the original data, dividing by the standard deviation ensures that it’s tightly grouped.
44
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Regardless of the variation in the original data, dividing by the standard deviation ensures that it’s tightly grouped. Why do we need to ensure that the data is tightly grouped?
45
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Regardless of the variation in the original data, dividing by the standard deviation ensures that it’s tightly grouped. Why do we need to ensure that the data is tightly grouped? Because we can only discern so many shades of colors. The wider the range, the more subtle the difference in the shades.
46
Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Regardless of the variation in the original data, dividing by the standard deviation ensures that it’s tightly grouped. Why do we need to ensure that the data is tightly grouped? Because we can only discern so many shades of colors. The wider the range, the more subtle the difference in the shades. By tightly grouping the data, we use fewer shades and it is easier to see, “Sample 1 has more transcription than Sample 2…”
47
A brief aside… What if there is an outlier?
48
A brief aside… What if there is an outlier?
C D E F -25 -20 -15 -10 -5 5 10 15 20 25
49
A brief aside… What if there is an outlier?
C D E F -25 -20 -15 -10 -5 5 10 15 20 25 The standard deviation will be much larger.
50
A brief aside… What if there is an outlier?
C D E F -25 -20 -15 -10 -5 5 10 15 20 25 sample value – the mean the standard deviation The standard deviation will be much larger. That is to say, the denominator will be larger.
51
A brief aside… What if there is an outlier?
C D E F -25 -20 -15 -10 -5 5 10 15 20 25 sample value – the mean the standard deviation The standard deviation will be much larger. That is to say, the denominator will be larger. And the values near zero will get compressed a lot and it will be hard to separate them with only a few shades. A B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5
52
When we did “global scaling” on the dataset with the outlier, we saw what happens with an outlier.
One gene is clearly highly expressed, but we can’t see any differences in the other genes.
53
Clustering – The fun part!
54
Clustering – The fun part!
There are two main types of clustering: Hierarchical K-means
55
Clustering – The fun part!
There are two main types of clustering: Hierarchical K-means We’ll focus on hierarchical clustering for now…
56
Hierarchical Clustering
Samples: #1 #2 #3 Gene 1 Gene 2 Gene 3 Gene 4
57
Hierarchical Clustering
Samples: #1 #2 #3 Gene 1 Gene 2 Gene 3 Gene 4 For this example, we are just going to use clustering to reorder the rows (genes).
58
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4
59
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 Genes #1 and #2 are different
60
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 Genes #1 and #3 are similar
61
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 Genes #1 and #4 are similar.
62
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 However, gene #1 is most similar to gene #3.
63
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Figure out which genes is most similar to gene #2... (and then #3 and then #4). Gene 1 Gene 2 Gene 3 Gene 4 Gene #2 is most similar to gene #4 (etc…)
64
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge them into a cluster. Gene 1 Gene 2 Gene 3 Gene 4
65
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge them into a cluster. Gene 1 Gene 2 Gene 3 Gene 4 Genes #1 and #3 are more similar than any other combination.
66
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge them into a cluster. Gene 1 Gene 3 Gene 2 Gene 4 Genes #1 and #3 are now cluster #1. Cluster #1
67
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge those into a cluster. Go back to step 1, but now treat the new cluster like it’s a single gene. Gene 1 Gene 3 Gene 2 Gene 4 Cluster #1
68
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1/cluster #1 Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge those into a cluster. Go back to step 1, but now treat the new cluster like it’s a single gene. Gene 1 Gene 3 Gene 2 Gene 4 Cluster #1 Cluster #1 is most similar to gene #4
69
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1/cluster #1 Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge those into a cluster. Go back to step 1, but now treat the new cluster like it’s a single gene. Gene 1 Gene 3 Gene 2 Gene 4 Cluster #1 Gene #2 is most similar to gene #4
70
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1/cluster #1 Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge those into a cluster. Go back to step 1, but now treat the new cluster like it’s a single gene. Gene 1 Gene 3 Gene 2 Gene 4 Cluster #1 Genes #2 and #4 are the most similar combination.
71
Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1/cluster #1 Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge those into a cluster. Go back to step 1, but now treat the new cluster like it’s a single gene. Cluster #1 Cluster #2 Done!
72
Hierarchical Clustering
Samples: #1 #2 #3 Cluster #1 Cluster #2 Hierarchical clustering is usually accompanied by a “dendrogram”. It indicates both the similarity and the order that the clusters were formed.
73
Hierarchical Clustering
Samples: #1 #2 #3 Cluster #1 was formed first and is most similar Cluster #1 Cluster #2 Hierarchical clustering is usually accompanied by a “dendrogram”. It indicates both the similarity and the order that the clusters were formed.
74
Hierarchical Clustering
Samples: #1 #2 #3 Cluster #1 Cluster #2 was second and is the second most similar. Cluster #2 Hierarchical clustering is usually accompanied by a “dendrogram”. It indicates both the similarity and the order that the clusters were formed.
75
Hierarchical Clustering
Samples: Cluster #3, which contains all of the genes, was formed last. #1 #2 #3 Cluster #1 Cluster #2 Hierarchical clustering is usually accompanied by a “dendrogram”. It indicates both the similarity and the order that the clusters were formed.
76
Hierarchical Clustering – a few nit-picky details
77
Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4
78
Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 We have to define what “most similar” means!
79
Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 The method for determining similarity is arbitrarily chosen. However, there are some common practices. 1) Euclidian distance between genes:
80
Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 The method for determining similarity is arbitrarily chosen. However, there are some common practices. 1) Euclidian distance between genes: √ (difference in sample #1)2+ (difference in sample #2)2 + (difference in sample…)2
81
Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 The method for determining similarity is arbitrarily chosen. However, there are some common practices. 1) Euclidian distance between genes: √ (difference in sample #1)2+ (difference in sample #2)2 + (difference in sample…)2 To see the Euclidian distance in action, let’s assume there are only two samples and two genes.
82
Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 Gene 1 Gene 2 √ (difference in sample #1)2+ (difference in sample #2)2 + (difference in sample…)2 To see the Euclidian distance in action, let’s assume there are only two samples and two genes.
83
Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 Gene 1 Gene 2 √ (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.
84
Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 Gene 1 Gene 2 1.6 0.5 -0.5 -1.9 √ (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.
85
Hierarchical Clustering – a few nit-picky details
√ Samples: (1.6 – (-0.5))2 + (0.5 – (-1.9))2 #1 #2 Gene 1 Gene 2 1.6 0.5 -0.5 -1.9 √ (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.
86
Hierarchical Clustering – a few nit-picky details
√ Samples: (1.6 – (-0.5))2 + (0.5 – (-1.9))2 #1 #2 Gene 1 Gene 2 1.6 0.5 Sample #1: the difference between genes #1 and #2 -0.5 -1.9 √ (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.
87
Hierarchical Clustering – a few nit-picky details
√ Samples: (1.6 – (-0.5))2 + (0.5 – (-1.9))2 #1 #2 Gene 1 Gene 2 1.6 0.5 Sample #1: the difference between genes #1 and #2 Sample #2: The difference between genes #1 and #2 -0.5 -1.9 √ (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.
88
Hierarchical Clustering – a few nit-picky details
√ Samples: (1.6 – (-0.5))2 + (0.5 – (-1.9))2 #1 #2 Gene 1 Gene 2 1.6 0.5 √ This is the “distance” between genes #1 and #2. (2.1)2 + (2.4)2 -0.5 -1.9 2.4 2.1 √ (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.
89
Hierarchical Clustering – distance metrics
Euclidian distance is just one method… there are lots more, including: Manhattan Canberra etc.
90
Hierarchical Clustering – distance metrics
Euclidian distance is just one method… there are lots more, including: Manhattan Canberra etc. For example, the Manhattan distance is just the absolute value of the differences…. |difference in sample #1|+ |difference in sample #2| + |difference in gene …|
91
Hierarchical Clustering – distance metrics
Euclidian distance is just one method… there are lots more, including: Manhattan Canberra etc. Yes, it makes a difference. For example, the Manhattan distance is just the absolute value of the differences…. |difference in sample #1|+ |difference in sample #2| + |difference in gene …|
92
Hierarchical Clustering – distance metrics
Euclidian distance is just one method… there are lots more, including: Manhattan Canberra etc. Yes, it makes a difference. :( For example, the Manhattan distance is just the absolute value of the differences…. |difference in sample #1|+ |difference in sample #2| + |difference in gene …|
93
Using the “Euclidean” distance…
94
Using the “Euclidean” distance…
Using the “Manhattan” distance…
95
Using the “Euclidean” distance…
Using the “Manhattan” distance… But the choice is arbitrary…
96
Using the “Euclidean” distance…
Using the “Manhattan” distance… But the choice is arbitrary… :(
97
Hierarchical Clustering – more nit-picky details
Samples: #1 #2 #3 Gene 1 Gene 3 Gene 2 Gene 4 Do you remember how we merged genes #1 and #3 into cluster #1 and compared it to other genes? Cluster #1
98
Hierarchical Clustering – more nit-picky details
Samples: #1 #2 #3 Gene 1 Gene 3 Gene 2 Gene 4 Do you remember how we merged genes #1 and #3 into cluster #1 and compared it to other genes? Well, there are different ways to do that, too. Cluster #1
99
Hierarchical Clustering – more nit-picky details
Samples: #1 #2 #3 Gene 1 Gene 3 Gene 2 Gene 4 Do you remember how we merged genes #1 and #3 into cluster #1 and compared it to other genes? Well, there are different ways to do that, too. One simple idea is to compare other genes to the average of the measurements from each sample. But there are lots more. Cluster #1
100
Hierarchical Clustering – more nit-picky details
Samples: #1 #2 #3 Gene 1 Gene 3 Gene 2 Gene 4 Do you remember how we merged genes #1 and #3 into cluster #1 and compared it to other genes? Well, there are different ways to do that, too. One simple idea is to compare other genes to the average of the measurements from each sample. But there are lots more. And these effect clustering as well… Cluster #1
101
Hierarchical Clustering – more nit-picky details
Samples: #1 #2 #3 Gene 1 Gene 3 Gene 2 Gene 4 Do you remember how we merged genes #1 and #3 into cluster #1 and compared it to other genes? Well, there are different ways to do that, too. One simple idea is to compare other genes to the average of the measurements from each sample. But there are lots more. And these effect clustering as well… :( Cluster #1
102
Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane.
103
Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. Now imagine that we have already formed these two clusters…
104
Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. … and we just want to figure out which cluster this last point belongs to.
105
Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. We can compare that point to… The average
106
Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. We can compare that point to… The average The closest point
107
Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. We can compare that point to… The average The closest point The furthest point
108
Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. We can compare that point to… The average The closest point The furthest point etc.
109
Some examples… Compare points to the furthest in the cluster.
NOTE: This is the default for clustering in R.
110
Some examples… Compare points to the furthest in the cluster.
NOTE: This is the default for clustering in R. Compare points to the cluster average
111
Some examples… Compare points to the furthest in the cluster.
NOTE: This is the default for clustering in R. Compare points to the cluster average Compare points to the closest in the cluster.
112
In summary, to make a heatmap you:
113
In summary, to make a heatmap you:
Scale the data (either per gene, or globally). Cluster the data (either by gene, or sample, or both gene and sample) Hierarchical Clustering Discussed in this StatQuest! K-Means You decide how many clusters there should be The computer figures out which samples go in which cluster by trying to minimize some metric of dispersion (i.e. variance).
114
In summary, to make a heatmap you:
Scale the data (either per gene, per sample, or globally). Cluster the data (either by gene, or sample, or both gene and sample) Hierarchical Clustering Discussed in this StatQuest! K-Means You decide how many clusters there should be The computer figures out which samples go in which cluster by trying to minimize some metric of dispersion (i.e. variance).
115
In summary, to make a heatmap you:
Scale the data (either per gene, per sample, or globally). Cluster the data (either by gene, or sample, or both gene and sample) Hierarchical Clustering Discussed in this StatQuest! K-Means You decide how many clusters there should be The computer figures out which samples go in which cluster by trying to minimize some metric of dispersion (i.e. variance).
116
In summary, to make a heatmap you:
Scale the data (either per gene, per sample, or globally). Cluster the data (either by gene, or sample, or both gene and sample) Hierarchical Clustering Discussed in this StatQuest! K-Means You decide how many clusters there should be The computer figures out which samples go in which cluster by trying to minimize some metric of dispersion (i.e. variance). This deserves a separate StatQuest!!!
117
THE END!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.