Presentation is loading. Please wait.

Presentation is loading. Please wait.

StatQuest! http://www.statquest.com.

Similar presentations


Presentation on theme: "StatQuest! http://www.statquest.com."— Presentation transcript:

1 StatQuest!

2 StatQuest!

3 StatQuest!

4 Heatmaps… You’ve seen them before….

5 The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples.

6 The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples. This data has been modified in 2 ways so that we can gain some insights from it.

7 The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples. This data has been modified in 2 ways so that we can gain some insights from it. 1) The relative abundances have been scaled. In this case, this was done on per gene basis (other heatmaps scale all the genes at once). This makes it easy to see that sample X has more/less of gene Y than sample Z.

8 The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples. This data has been modified in 2 ways so that we can gain some insights from it. 1) The relative abundances have been scaled. In this case, this was done on per gene basis (other heatmaps scale all the genes at once). This makes it easy to see that sample X has more/less of gene Y than sample Z. It’s easy to see that Sample 1 expresses this gene more than the others.

9 The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples. This data has been modified in 2 ways so that we can gain some insights from it. 1) The relative abundances have been scaled. In this case, this was done on per gene basis (other heatmaps scale all the genes at once). This makes it easy to see that sample X has more/less of gene Y than sample Z. It’s easy to see that Sample 1 expresses this gene more than the others. However, this specific scaling means we can’t compare across genes. The dark red bar in the Sample 1 for this gene doesn’t mean that Sample 1 transcribes it more than other genes, just other samples.

10 The columns are RNA-seq samples.
Here’s a heatmap! The rows are genes. The columns are RNA-seq samples. This data has been modified in 2 ways so that we can gain some insights from it. 1) The relative abundances have been scaled. In this case, this was done on per gene basis (other heatmaps scale all the genes at once). This makes it easy to see that sample X has more/less of gene Y than sample Z. 2) The rows/genes have been grouped according to “similarity”.

11 These genes are transcribed most in the 2nd sample (and least in the 4th sample).

12 These genes are transcribed most in the 2nd sample (and least in the 4th sample).
These genes are transcribed most in the 1st sample (and least in the 4th sample).

13 These genes are transcribed most in the 2nd sample (and least in the 4th sample).
These genes are transcribed most in the 1st sample (and least in the 4th sample). These genes are transcribe most in the 2nd sample (and least in the 3rd sample).

14 The “clustering” isn’t by chance, but due to a computer program that tries to put “similar” things close together.

15 Without clustering the data would look like this…

16 Without clustering or scaling, the data would look like this!!!!

17 Without clustering or scaling, the data would look like this!!!!
Notice that one gene is highly transcribed compared to the others. It’s an outlier…

18 Another example…

19 This heatmap has been scaled and clustered.
The scaling is “global” – not per row/gene – but for all rows/genes.

20 This heatmap has been scaled and clustered.
The scaling is “global” – not per row/gene – but for all rows/genes. We can use “global” scaling because we don’t have an outlier like we did in the last dataset.

21 This heatmap has been scaled and clustered.
The scaling is “global” – not per row/gene – but for all rows/genes. The clustering is by column/sample AND by row/gene.

22 These columns/samples cluster together.
This heatmap has been scaled and clustered. The scaling is “global” – not per row/gene – but for all rows/genes. The clustering is by column/sample AND by row/gene.

23 These columns/samples cluster together.
This heatmap has been scaled and clustered. The scaling is “global” – not per row/gene – but for all rows/genes. The clustering is by column/sample AND by row/gene. These rows/genes cluster together.

24 Without clustering

25 Without clustering or scaling

26 A quick aside....

27 What if we had used global scaling with the first heatmap?

28 Now using global scaling…
The outlier skews the scale so much it is impossible to see the other genes.

29 Now using global scaling…
Also, notice that the clustering changes and the genes have a new order. The outlier skews the scale so much it is impossible to see the other genes.

30 Scaling can affect two things:
How brightly colored the genes are and whether you can compare between them. The clustering. Now using global scaling… Also, notice that the clustering changes and the genes have a new order. The outlier skews the scale so much it is impossible to see the other genes.

31 … now back to the action.

32 How to scale data… Regardless of whether you do it by gene or globally, the most common method is… nameless! I hate to coin a new term, but let’s call it “Z-Score Scaling” because, technically, it converts the data to “Z-scores”

33 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F RNA-seq read counts from 6 samples. 5 10 15 20 25

34 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F RNA-seq read counts from 6 samples. 5 10 15 20 25 Step 1) Calculate the mean (16.5)

35 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -25 -20 -15 -10 -5 5 10 15 20 25 Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value

36 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -25 -20 -15 -10 -5 5 10 15 20 25 This centers the data around 0. Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value

37 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -25 -20 -15 -10 -5 5 10 15 20 25 This centers the data around 0. Samples with relatively high transcription get positive values. Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value

38 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -25 -20 -15 -10 -5 5 10 15 20 25 This centers the data around 0. Samples with relatively high transcription get positive values. Samples with relatively low transcription get negative values. Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value

39 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -25 -20 -15 -10 -5 5 10 15 20 25 Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value Step 3) Calculate the standard deviation (6.28)

40 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value Step 3) Calculate the standard deviation (6.28) Step 4) Divide by the standard deviation (notice, the scale on the axis has changed)

41 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value Step 3) Calculate the standard deviation (6.28) Step 4) Divide by the standard deviation (notice, the scale on the axis has changed) The data used to be spread from -8 to +8. Now it is between -1.2 and 1.2

42 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Step 1) Calculate the mean (16.5) Step 2) Subtract the mean from each value Step 3) Calculate the standard deviation (6.28) Step 4) Divide by the standard deviation (notice, the scale on the axis has changed) The formula for Z-score scaling sample value – the mean the standard deviation si - µ s a.k.a.

43 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Regardless of the variation in the original data, dividing by the standard deviation ensures that it’s tightly grouped.

44 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Regardless of the variation in the original data, dividing by the standard deviation ensures that it’s tightly grouped. Why do we need to ensure that the data is tightly grouped?

45 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Regardless of the variation in the original data, dividing by the standard deviation ensures that it’s tightly grouped. Why do we need to ensure that the data is tightly grouped? Because we can only discern so many shades of colors. The wider the range, the more subtle the difference in the shades.

46 Converting to Z-Scores (i.e. Z-score scaling)
B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5 Regardless of the variation in the original data, dividing by the standard deviation ensures that it’s tightly grouped. Why do we need to ensure that the data is tightly grouped? Because we can only discern so many shades of colors. The wider the range, the more subtle the difference in the shades. By tightly grouping the data, we use fewer shades and it is easier to see, “Sample 1 has more transcription than Sample 2…”

47 A brief aside… What if there is an outlier?

48 A brief aside… What if there is an outlier?
C D E F -25 -20 -15 -10 -5 5 10 15 20 25

49 A brief aside… What if there is an outlier?
C D E F -25 -20 -15 -10 -5 5 10 15 20 25 The standard deviation will be much larger.

50 A brief aside… What if there is an outlier?
C D E F -25 -20 -15 -10 -5 5 10 15 20 25 sample value – the mean the standard deviation The standard deviation will be much larger. That is to say, the denominator will be larger.

51 A brief aside… What if there is an outlier?
C D E F -25 -20 -15 -10 -5 5 10 15 20 25 sample value – the mean the standard deviation The standard deviation will be much larger. That is to say, the denominator will be larger. And the values near zero will get compressed a lot and it will be hard to separate them with only a few shades. A B C D E F -2.5 -2.0 -1.5 -1 -0.5 0.5 1.0 1.5 2.0 2.5

52 When we did “global scaling” on the dataset with the outlier, we saw what happens with an outlier.
One gene is clearly highly expressed, but we can’t see any differences in the other genes.

53 Clustering – The fun part!

54 Clustering – The fun part!
There are two main types of clustering: Hierarchical K-means

55 Clustering – The fun part!
There are two main types of clustering: Hierarchical K-means We’ll focus on hierarchical clustering for now…

56 Hierarchical Clustering
Samples: #1 #2 #3 Gene 1 Gene 2 Gene 3 Gene 4

57 Hierarchical Clustering
Samples: #1 #2 #3 Gene 1 Gene 2 Gene 3 Gene 4 For this example, we are just going to use clustering to reorder the rows (genes).

58 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4

59 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 Genes #1 and #2 are different

60 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 Genes #1 and #3 are similar

61 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 Genes #1 and #4 are similar.

62 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 However, gene #1 is most similar to gene #3.

63 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Figure out which genes is most similar to gene #2... (and then #3 and then #4). Gene 1 Gene 2 Gene 3 Gene 4 Gene #2 is most similar to gene #4 (etc…)

64 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge them into a cluster. Gene 1 Gene 2 Gene 3 Gene 4

65 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge them into a cluster. Gene 1 Gene 2 Gene 3 Gene 4 Genes #1 and #3 are more similar than any other combination.

66 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge them into a cluster. Gene 1 Gene 3 Gene 2 Gene 4 Genes #1 and #3 are now cluster #1. Cluster #1

67 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1. Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge those into a cluster. Go back to step 1, but now treat the new cluster like it’s a single gene. Gene 1 Gene 3 Gene 2 Gene 4 Cluster #1

68 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1/cluster #1 Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge those into a cluster. Go back to step 1, but now treat the new cluster like it’s a single gene. Gene 1 Gene 3 Gene 2 Gene 4 Cluster #1 Cluster #1 is most similar to gene #4

69 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1/cluster #1 Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge those into a cluster. Go back to step 1, but now treat the new cluster like it’s a single gene. Gene 1 Gene 3 Gene 2 Gene 4 Cluster #1 Gene #2 is most similar to gene #4

70 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1/cluster #1 Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge those into a cluster. Go back to step 1, but now treat the new cluster like it’s a single gene. Gene 1 Gene 3 Gene 2 Gene 4 Cluster #1 Genes #2 and #4 are the most similar combination.

71 Hierarchical Clustering
Samples: Conceptually… #1 #2 #3 Figure out which gene is most similar to gene #1/cluster #1 Figure out which genes is most similar to gene #2... (and then #3 and then #4). Of the different combinations, figures out which two genes are the most similar. Merge those into a cluster. Go back to step 1, but now treat the new cluster like it’s a single gene. Cluster #1 Cluster #2 Done!

72 Hierarchical Clustering
Samples: #1 #2 #3 Cluster #1 Cluster #2 Hierarchical clustering is usually accompanied by a “dendrogram”. It indicates both the similarity and the order that the clusters were formed.

73 Hierarchical Clustering
Samples: #1 #2 #3 Cluster #1 was formed first and is most similar Cluster #1 Cluster #2 Hierarchical clustering is usually accompanied by a “dendrogram”. It indicates both the similarity and the order that the clusters were formed.

74 Hierarchical Clustering
Samples: #1 #2 #3 Cluster #1 Cluster #2 was second and is the second most similar. Cluster #2 Hierarchical clustering is usually accompanied by a “dendrogram”. It indicates both the similarity and the order that the clusters were formed.

75 Hierarchical Clustering
Samples: Cluster #3, which contains all of the genes, was formed last. #1 #2 #3 Cluster #1 Cluster #2 Hierarchical clustering is usually accompanied by a “dendrogram”. It indicates both the similarity and the order that the clusters were formed.

76 Hierarchical Clustering – a few nit-picky details

77 Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4

78 Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 We have to define what “most similar” means!

79 Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 The method for determining similarity is arbitrarily chosen. However, there are some common practices. 1) Euclidian distance between genes:

80 Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 The method for determining similarity is arbitrarily chosen. However, there are some common practices. 1) Euclidian distance between genes: (difference in sample #1)2+ (difference in sample #2)2 + (difference in sample…)2

81 Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 #3 Figure out which gene is most similar to gene #1. Gene 1 Gene 2 Gene 3 Gene 4 The method for determining similarity is arbitrarily chosen. However, there are some common practices. 1) Euclidian distance between genes: (difference in sample #1)2+ (difference in sample #2)2 + (difference in sample…)2 To see the Euclidian distance in action, let’s assume there are only two samples and two genes.

82 Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 Gene 1 Gene 2 (difference in sample #1)2+ (difference in sample #2)2 + (difference in sample…)2 To see the Euclidian distance in action, let’s assume there are only two samples and two genes.

83 Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 Gene 1 Gene 2 (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.

84 Hierarchical Clustering – a few nit-picky details
Samples: #1 #2 Gene 1 Gene 2 1.6 0.5 -0.5 -1.9 (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.

85 Hierarchical Clustering – a few nit-picky details
Samples: (1.6 – (-0.5))2 + (0.5 – (-1.9))2 #1 #2 Gene 1 Gene 2 1.6 0.5 -0.5 -1.9 (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.

86 Hierarchical Clustering – a few nit-picky details
Samples: (1.6 – (-0.5))2 + (0.5 – (-1.9))2 #1 #2 Gene 1 Gene 2 1.6 0.5 Sample #1: the difference between genes #1 and #2 -0.5 -1.9 (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.

87 Hierarchical Clustering – a few nit-picky details
Samples: (1.6 – (-0.5))2 + (0.5 – (-1.9))2 #1 #2 Gene 1 Gene 2 1.6 0.5 Sample #1: the difference between genes #1 and #2 Sample #2: The difference between genes #1 and #2 -0.5 -1.9 (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.

88 Hierarchical Clustering – a few nit-picky details
Samples: (1.6 – (-0.5))2 + (0.5 – (-1.9))2 #1 #2 Gene 1 Gene 2 1.6 0.5 This is the “distance” between genes #1 and #2. (2.1)2 + (2.4)2 -0.5 -1.9 2.4 2.1 (difference in sample #1)2+ (difference in sample #2)2 + (difference in gene …)2 You might recognize this as the Pythagorean Theorem.

89 Hierarchical Clustering – distance metrics
Euclidian distance is just one method… there are lots more, including: Manhattan Canberra etc.

90 Hierarchical Clustering – distance metrics
Euclidian distance is just one method… there are lots more, including: Manhattan Canberra etc. For example, the Manhattan distance is just the absolute value of the differences…. |difference in sample #1|+ |difference in sample #2| + |difference in gene …|

91 Hierarchical Clustering – distance metrics
Euclidian distance is just one method… there are lots more, including: Manhattan Canberra etc. Yes, it makes a difference. For example, the Manhattan distance is just the absolute value of the differences…. |difference in sample #1|+ |difference in sample #2| + |difference in gene …|

92 Hierarchical Clustering – distance metrics
Euclidian distance is just one method… there are lots more, including: Manhattan Canberra etc. Yes, it makes a difference. :( For example, the Manhattan distance is just the absolute value of the differences…. |difference in sample #1|+ |difference in sample #2| + |difference in gene …|

93 Using the “Euclidean” distance…

94 Using the “Euclidean” distance…
Using the “Manhattan” distance…

95 Using the “Euclidean” distance…
Using the “Manhattan” distance… But the choice is arbitrary…

96 Using the “Euclidean” distance…
Using the “Manhattan” distance… But the choice is arbitrary… :(

97 Hierarchical Clustering – more nit-picky details
Samples: #1 #2 #3 Gene 1 Gene 3 Gene 2 Gene 4 Do you remember how we merged genes #1 and #3 into cluster #1 and compared it to other genes? Cluster #1

98 Hierarchical Clustering – more nit-picky details
Samples: #1 #2 #3 Gene 1 Gene 3 Gene 2 Gene 4 Do you remember how we merged genes #1 and #3 into cluster #1 and compared it to other genes? Well, there are different ways to do that, too. Cluster #1

99 Hierarchical Clustering – more nit-picky details
Samples: #1 #2 #3 Gene 1 Gene 3 Gene 2 Gene 4 Do you remember how we merged genes #1 and #3 into cluster #1 and compared it to other genes? Well, there are different ways to do that, too. One simple idea is to compare other genes to the average of the measurements from each sample. But there are lots more. Cluster #1

100 Hierarchical Clustering – more nit-picky details
Samples: #1 #2 #3 Gene 1 Gene 3 Gene 2 Gene 4 Do you remember how we merged genes #1 and #3 into cluster #1 and compared it to other genes? Well, there are different ways to do that, too. One simple idea is to compare other genes to the average of the measurements from each sample. But there are lots more. And these effect clustering as well… Cluster #1

101 Hierarchical Clustering – more nit-picky details
Samples: #1 #2 #3 Gene 1 Gene 3 Gene 2 Gene 4 Do you remember how we merged genes #1 and #3 into cluster #1 and compared it to other genes? Well, there are different ways to do that, too. One simple idea is to compare other genes to the average of the measurements from each sample. But there are lots more. And these effect clustering as well… :( Cluster #1

102 Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane.

103 Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. Now imagine that we have already formed these two clusters…

104 Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. … and we just want to figure out which cluster this last point belongs to.

105 Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. We can compare that point to… The average

106 Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. We can compare that point to… The average The closest point

107 Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. We can compare that point to… The average The closest point The furthest point

108 Different Ways To Compare To Clusters
For the sake of visualizing how the different methods work, imagine our data was spread out on an X-Y plane. We can compare that point to… The average The closest point The furthest point etc.

109 Some examples… Compare points to the furthest in the cluster.
NOTE: This is the default for clustering in R.

110 Some examples… Compare points to the furthest in the cluster.
NOTE: This is the default for clustering in R. Compare points to the cluster average

111 Some examples… Compare points to the furthest in the cluster.
NOTE: This is the default for clustering in R. Compare points to the cluster average Compare points to the closest in the cluster.

112 In summary, to make a heatmap you:

113 In summary, to make a heatmap you:
Scale the data (either per gene, or globally). Cluster the data (either by gene, or sample, or both gene and sample) Hierarchical Clustering Discussed in this StatQuest! K-Means You decide how many clusters there should be The computer figures out which samples go in which cluster by trying to minimize some metric of dispersion (i.e. variance).

114 In summary, to make a heatmap you:
Scale the data (either per gene, per sample, or globally). Cluster the data (either by gene, or sample, or both gene and sample) Hierarchical Clustering Discussed in this StatQuest! K-Means You decide how many clusters there should be The computer figures out which samples go in which cluster by trying to minimize some metric of dispersion (i.e. variance).

115 In summary, to make a heatmap you:
Scale the data (either per gene, per sample, or globally). Cluster the data (either by gene, or sample, or both gene and sample) Hierarchical Clustering Discussed in this StatQuest! K-Means You decide how many clusters there should be The computer figures out which samples go in which cluster by trying to minimize some metric of dispersion (i.e. variance).

116 In summary, to make a heatmap you:
Scale the data (either per gene, per sample, or globally). Cluster the data (either by gene, or sample, or both gene and sample) Hierarchical Clustering Discussed in this StatQuest! K-Means You decide how many clusters there should be The computer figures out which samples go in which cluster by trying to minimize some metric of dispersion (i.e. variance). This deserves a separate StatQuest!!!

117 THE END!


Download ppt "StatQuest! http://www.statquest.com."

Similar presentations


Ads by Google