Download presentation
Presentation is loading. Please wait.
Published byMaud Miles Modified over 9 years ago
1
1 Limma homework Is it possible that some of these gene expression changes are miscalled (i.e. biologically significant but insignificant p value and vice versa) and why? What other criteria might you use to distinguish genes you care about? How many genes pass the cutoff of q<0.01 and how does this compare to the number of genes that pass the Bonferroni corrected p-value? 1920 genes have q<0.01 … only 20 genes have a p < 1.87 x 10 -6 (0.01/ 5338 Ttests) What does using a q-value cutoff of 0.01 correspond to? About 1% of selected genes (i.e. 19 genes out of 1920 with q<0.01) could be false positives. Sensitivity: 66 known Hsf1 targets (* but we only had data for 62) 40 of them had q<0.01. Sensitivity: 40/62 = 64.5%
2
2 Gene X: X 1 X 2 X 3 Array 1Array 2Array 3 x coordinate y coordinate z coordinate LAST TIME:
3
3 4. Centroid linkage clustering ‘ centroid ’ (average vector) LAST TIME:
4
4 Gene X: X 1 X 2 X 3 X 4 X 5 Array 1Array 2Array 3Array 4Array 5 Gene Y: Y 1 Y 2 Y 3 Y 4 Y 5 Sometimes, want to use the weighted pearson correlation For example: if these arrays are identical, the data are over-represented 3X (X i ) (Y i ) N S x,y = i = 1 N XiXi N 2 N YiYi N 2 N
5
5 (X i ) (Y i ) wiwi S x,y = i = 1 N Gene X: X 1 X 2 X 3 X 4 X 5 Array 1Array 2Array 3Array 4Array 5 Gene Y: Y 1 Y 2 Y 3 Y 4 Y 5 Sometimes, want to use the weighted pearson correlation For example: if these arrays are identical, the data are over-represented 3X -- can weight experiments i = 3,4,5 by w = 0.33 wiwi Where w i = 1 L i k = array corr. cutoff d = Pearson distance (= 1 - P. corr) n = exponent (usually 1) XiXi i = 1 N 2 N YiYi N 2 N
6
6 Unweighted Pearson correlationWeighted Pearson correlation
7
7 Unweighted Pearson correlationWeighted Pearson correlation
8
8 Alizadeh et al. 2000 Can also cluster array experiments based on global similarity in expression
9
9 A B C D F E Hierarchical trees of gene expression data are analogous to phylogenetic trees Distance between genes is proportionate to the total branchlength between genes (not the distance on the y-axis) Orientation of the nodes is irrelevant …. although some clustering programs try to organize nodes in some way.
10
10 A B C D F E Hierarchical trees of gene expression data are analogous to phylogenetic trees Distance between genes is proportionate to the total branchlength between genes (not the distance on the y-axis) Orientation of the nodes is irrelevant …. although some clustering programs try to organize nodes in some way. C F E D A B
11
11 Genes involved in same cellular process are often coregulated These genes may not have the same annotation, but still function together and are thus co-expressed
12
12 M choose i = # of possible groups of size i composed of the objects M = M ! (M-i)! * i !
13
13 Advantages and Disadvantages of Hierarchical clustering Advantages: 1) Straightforward 2) Captures biological information relatively well Disadvantages: 1) Doesn ’ t give discrete clusters … need to define clusters with cutoffs 2) Hierarchical arrangement does not always represent data appropriately -- sometimes a hierarchy is not appropriate: genes can belong only to one cluster. 3) Get different clustering for different experiment sets THERE IS NO ONE PERFECT CLUSTERING METHOD
14
14 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering
15
15 Centroids Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering
16
16 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering What are the disadvantages of k-means clustering?
17
17 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering What are the disadvantages of k-means clustering? - Need to know how many clusters to ask for (can define this empirically) - Genes are not organized within each cluster (can hierarchically cluster genes afterwards or use SOM analysis) - Random process makes this an indeterminate method
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.