1 Limma homework Is it possible that some of these gene expression changes are miscalled (i.e. biologically significant but insignificant p value and vice.

1 Limma homework Is it possible that some of these gene expression changes are miscalled (i.e. biologically significant but insignificant p value and vice versa) and why? What other criteria might you use to distinguish genes you care about? How many genes pass the cutoff of q<0.01 and how does this compare to the number of genes that pass the Bonferroni corrected p-value? 1920 genes have q<0.01 … only 20 genes have a p < 1.87 x 10 -6 (0.01/ 5338 Ttests) What does using a q-value cutoff of 0.01 correspond to? About 1% of selected genes (i.e. 19 genes out of 1920 with q<0.01) could be false positives. Sensitivity: 66 known Hsf1 targets (* but we only had data for 62) 40 of them had q<0.01. Sensitivity: 40/62 = 64.5%

2 Gene X: X 1 X 2 X 3 Array 1Array 2Array 3 x coordinate y coordinate z coordinate LAST TIME:

3 4. Centroid linkage clustering ‘ centroid ’ (average vector) LAST TIME:

4 Gene X: X 1 X 2 X 3 X 4 X 5 Array 1Array 2Array 3Array 4Array 5 Gene Y: Y 1 Y 2 Y 3 Y 4 Y 5 Sometimes, want to use the weighted pearson correlation For example: if these arrays are identical, the data are over-represented 3X  (X i ) (Y i ) N S x,y =  i = 1 N XiXi  N 2  N YiYi  N 2  N

5  (X i ) (Y i ) wiwi S x,y =  i = 1 N Gene X: X 1 X 2 X 3 X 4 X 5 Array 1Array 2Array 3Array 4Array 5 Gene Y: Y 1 Y 2 Y 3 Y 4 Y 5 Sometimes, want to use the weighted pearson correlation For example: if these arrays are identical, the data are over-represented 3X -- can weight experiments i = 3,4,5 by w = 0.33 wiwi  Where w i = 1 L i k = array corr. cutoff d = Pearson distance (= 1 - P. corr) n = exponent (usually 1) XiXi  i = 1 N 2  N YiYi  N 2  N

6 Unweighted Pearson correlationWeighted Pearson correlation

7 Unweighted Pearson correlationWeighted Pearson correlation

8 Alizadeh et al. 2000 Can also cluster array experiments based on global similarity in expression

9 A B C D F E Hierarchical trees of gene expression data are analogous to phylogenetic trees Distance between genes is proportionate to the total branchlength between genes (not the distance on the y-axis) Orientation of the nodes is irrelevant …. although some clustering programs try to organize nodes in some way.

10 A B C D F E Hierarchical trees of gene expression data are analogous to phylogenetic trees Distance between genes is proportionate to the total branchlength between genes (not the distance on the y-axis) Orientation of the nodes is irrelevant …. although some clustering programs try to organize nodes in some way. C F E D A B

11 Genes involved in same cellular process are often coregulated These genes may not have the same annotation, but still function together and are thus co-expressed

12 M choose i = # of possible groups of size i composed of the objects M = M ! (M-i)! * i !

13 Advantages and Disadvantages of Hierarchical clustering Advantages: 1) Straightforward 2) Captures biological information relatively well Disadvantages: 1) Doesn ’ t give discrete clusters … need to define clusters with cutoffs 2) Hierarchical arrangement does not always represent data appropriately -- sometimes a hierarchy is not appropriate: genes can belong only to one cluster. 3) Get different clustering for different experiment sets THERE IS NO ONE PERFECT CLUSTERING METHOD

14 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering

15 Centroids Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering

16 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering What are the disadvantages of k-means clustering?

17 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering What are the disadvantages of k-means clustering? - Need to know how many clusters to ask for (can define this empirically) - Genes are not organized within each cluster (can hierarchically cluster genes afterwards or use SOM analysis) - Random process makes this an indeterminate method

1 Limma homework Is it possible that some of these gene expression changes are miscalled (i.e. biologically significant but insignificant p value and vice.

Similar presentations

Presentation on theme: "1 Limma homework Is it possible that some of these gene expression changes are miscalled (i.e. biologically significant but insignificant p value and vice."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Limma homework Is it possible that some of these gene expression changes are miscalled (i.e. biologically significant but insignificant p value and vice.

Similar presentations

Presentation on theme: "1 Limma homework Is it possible that some of these gene expression changes are miscalled (i.e. biologically significant but insignificant p value and vice."— Presentation transcript:

Similar presentations

About project

Feedback