Presentation is loading. Please wait.

Presentation is loading. Please wait.

RedundancyMiner A novel method of clustering in genomic studies Barry Zeeberg, NCI Hongfang Liu, NCI and GU.

Similar presentations


Presentation on theme: "RedundancyMiner A novel method of clustering in genomic studies Barry Zeeberg, NCI Hongfang Liu, NCI and GU."— Presentation transcript:

1 RedundancyMiner A novel method of clustering in genomic studies Barry Zeeberg, NCI Hongfang Liu, NCI and GU

2 Gene Ontology (GO) AmiGO browser Hierarchical organization of categories and mapped genes

3 High-Throughput GoMiner (HTGM)

4 Typical HTGM result clustered image map (CIM)

5 Redundancy problem Because of the hierarchical nature of GO structure, parent-child categories may contain partially redundant gene mappings This can “inflate” the number of categories in the CIM Thus obscure the core information content in the CIM The redundancy itself can be studied to look at fine detail nuanced associations of category clusters

6 RedundancyMiner (RM) is an attempt to solve that problem Remove the redundancy from the CIM –Redundancy cause the CIM to be inflated by e.g. 3-fold Place the redundancy into a META CIM –Study the redundancy as a nuanced themes of association of groups of GO categories

7 RM paradigm Similarity metric is probabilistic value based on the number of genes mapped in common to two GO categories Groups in the META CIM follow a “complete linkage” criterion for a selected threshold of p value

8 RM overcomes two problems of traditional hierarchical clustering All objects are put into one cluster or another, even if the object truly is an outlier Each object can appear in only one cluster, even though it may be related to several clusters

9 CIM after RM

10 META CIM

11 Additional example gene expression in NCI-60 cell lines NCI-60 is set of 60 well-studied cancer cell lines Composed of around 5 or 6 each of around 8 or 9 different cancer types

12 Problem Full CIM of 60 cell lines x 20,000 gene expression values is too dense to allow meaningful viewing Solution is to select sub-portion of CIM based on RM analysis

13 NCI-60 META CIM based on correlation threshold = 0.20

14 Sub-CIM of highest correlating genes from group 33 Gene expression values are adjusted z-scores Red = positive z score Green = negative z score

15 Sub-CIM of highest correlating genes from group 32

16 Conclusions RM can remove redundancy from the primary CIM RM can display the nuanced themes of redundancy structure in the META CIM The META CIM can be used as the basis of further investigation


Download ppt "RedundancyMiner A novel method of clustering in genomic studies Barry Zeeberg, NCI Hongfang Liu, NCI and GU."

Similar presentations


Ads by Google